In this work, we introduce AutoFragDiff, a fragment-based autoregressive
diffusion model for generating 3D molecular structures conditioned on target
protein structures. We employ geometric vector perceptrons to predict atom
types and spatial coordinates of new molecular fragments conditioned on
molecular scaffolds and protein pockets. Our approach improves the local
geometry of the resulting 3D molecules while maintaining high predicted binding
affinity to protein targets. The model can also perform scaffold extension from
user-provided starting molecular scaffold.
( 2
min )
Auditory spatial attention detection (ASAD) is used to determine the
direction of a listener's attention to a speaker by analyzing her/his
electroencephalographic (EEG) signals. This study aimed to further improve the
performance of ASAD with a short decision window (i.e., <1 s) rather than with
long decision windows in previous studies. An end-to-end temporal attention
network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head
attention (MHA) mechanism, which can more effectively capture the interactions
among time steps in collected EEG signals and efficiently assign corresponding
weights to those EEG time steps. Experiments demonstrated that, compared with
the CNN-based method and recent ASAD methods, TAnet provided improved decoding
performance in the KUL dataset, with decoding accuracies of 92.4% (decision
window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s)
with short decision windows (i.e., <1 s). As a new ASAD model with a short
decision window, TAnet can potentially facilitate the design of EEG-controlled
intelligent hearing aids and sound recognition systems.
( 2
min )
While fingerprinting localization is favored for its effectiveness, it is
hindered by high data acquisition costs and the inaccuracy of static
database-based estimates. Addressing these issues, this letter presents an
innovative indoor localization method using a data-efficient meta-learning
algorithm. This approach, grounded in the ``Learning to Learn'' paradigm of
meta-learning, utilizes historical localization tasks to improve adaptability
and learning efficiency in dynamic indoor environments. We introduce a
task-weighted loss to enhance knowledge transfer within this framework. Our
comprehensive experiments confirm the method's robustness and superiority over
current benchmarks, achieving a notable 23.13\% average gain in Mean Euclidean
Distance, particularly effective in scenarios with limited CSI data.
( 2
min )
Idealized first-principles models of chemical plants can be inaccurate. An
alternative is to fit a Machine Learning (ML) model directly to plant sensor
data. We use a structured approach: Each unit within the plant gets represented
by one ML model. After fitting the models to the data, the models are connected
into a flowsheet-like directed graph. We find that for smaller plants, this
approach works well, but for larger plants, the complex dynamics arising from
large and nested cycles in the flowsheet lead to instabilities in the solver
during model initialization. We show that a high accuracy of the single-unit
models is not enough: The gradient can point in unexpected directions, which
prevents the solver from converging to the correct stationary state. To address
this problem, we present a way to fine-tune ML models such that initialization,
even with very simple solvers, becomes robust.
( 3
min )
This research examines the polycentric governance of digital assets in
blockchain-based Decentralized Autonomous Organizations (DAOs). It offers a
theoretical framework and addresses a critical challenge facing decentralized
governance by developing a method to identify sybils, or spurious identities.
The method uses graph deep learning techniques to identify sybil activity in a
DAO governance dataset (snapshot.org). Specifically, a Graph Convolutional
Neural Network (GCNN) learned voting behaviours and a fast k-means vector
clustering algorithm (FAISS) used the high dimensional embeddings to identify
similar nodes in a graph. The results reveal that deep learning can effectively
identify sybils, reducing the voting graph by 2-5%. This research underscores
the importance of sybil resistance in DAOs and offers a novel perspective on
decentralized governance, informing future policy, regulation, and governance
practices.
( 2
min )
Federated learning (FL) emphasizes decentralized training by storing data
locally and sending only model updates, underlining user privacy. Recently, a
line of works on privacy attacks impairs user privacy by extracting sensitive
training text from language models in the context of FL. Yet, these attack
techniques face distinct hurdles: some work chiefly with limited batch sizes
(e.g., batch size of 1), and others are easily detectable. This paper
introduces an innovative approach that is challenging to detect, significantly
enhancing the recovery rate of text in various batch-size settings. Building on
fundamental gradient matching and domain prior knowledge, we enhance the attack
by recovering the input of the Pooler layer of language models, which enables
us to provide additional supervised signals at the feature level. Unlike
gradient data, these signals do not average across sentences and tokens,
thereby offering more nuanced and effective insights. We benchmark our method
using text classification tasks on datasets such as CoLA, SST-2, and Rotten
Tomatoes. Across different batch sizes and models, our approach consistently
outperforms previous state-of-the-art results.
( 2
min )
We consider the problem of sequentially learning to estimate, in the mean
squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by
observing only $m < K$ of its entries in each round. We first establish a
concentration bound for MSE estimation. We then frame the estimation problem
with bandit feedback, and propose a variant of the successive elimination
algorithm. We also derive a minimax lower bound to understand the fundamental
limit on the sample complexity of this problem.
( 2
min )
Diffuse correlation spectroscopy (DCS) is an emerging noninvasive technique
that measures the tissue blood flow, by using near-infrared coherent
point-source illumination to detect spectral changes. While machine learning
has demonstrated significant potential for measuring blood flow index (BFi), an
open question concerning the success of this approach pertains to its
robustness in scenarios involving deviations between datasets with varying
Signal-to-Noise Ratios (SNRs) originating from diverse clinical applications
and various setups. This study proposes a transfer learning approach, aims to
assess the influence of SNRs on the generalization ability of learned features,
and demonstrate the robustness for transfer learning. A synthetic dataset with
varying levels of added noise is utilized to simulate different SNRs. The
proposed network takes a 1x64 autocorrelation curve as input and generates BFi
and the correlation parameter beta. The proposed model demonstrates excellent
performance across different SNRs, exhibiting enhanced fitting accuracy,
particularly for low SNR datasets when compared with other fitting methods.
This highlights its potential for clinical diagnosis and treatment across
various scenarios under different clinical setups.
( 2
min )
This paper explores the application of Shapley Value Regression in dissecting
marketing performance at channel-partner level, complementing channel-level
Marketing Mix Modeling (MMM). Utilizing real-world data from the financial
services industry, we demonstrate the practicality of Shapley Value Regression
in evaluating individual partner contributions. Although structured in-field
testing along with cooperative game theory is most accurate, it can often be
highly complex and expensive to conduct. Shapley Value Regression is thus a
more feasible approach to disentangle the influence of each marketing partner
within a marketing channel. We also propose a simple method to derive adjusted
coefficients of Shapley Value Regression and compares it with alternative
approaches.
( 2
min )
Representation learning frameworks in unlabeled time series have been
proposed for medical signal processing. Despite the numerous excellent
progresses have been made in previous works, we observe the representation
extracted for the time series still does not generalize well. In this paper, we
present a Time series (medical signal) Representation Learning framework via
Spectrogram (TRLS) to get more informative representations. We transform the
input time-domain medical signals into spectrograms and design a time-frequency
encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale
representations from the augmented spectrograms. Our TRLS takes spectrogram as
input with two types of different data augmentations and maximizes the
similarity between positive ones, which effectively circumvents the problem of
designing negative samples. Our evaluation of four real-world medical signal
datasets focusing on medical signal classification shows that TRLS is superior
to the existing frameworks.
( 2
min )
Electromyograms (EMG)-based hand gesture recognition systems are a promising
technology for human/machine interfaces. However, one of their main limitations
is the long calibration time that is typically required to handle new users.
The paper discusses and analyses the challenge of cross-subject generalization
thanks to an original dataset containing the EMG signals of 14 human subjects
during hand gestures. The experimental results show that, though an accurate
generalization based on pooling multiple subjects is hardly achievable, it is
possible to improve the cross-subject estimation by identifying a robust
low-dimensional subspace for multiple subjects and aligning it to a target
subject. A visualization of the subspace enables us to provide insights for the
improvement of cross-subject generalization with EMG signals.
( 2
min )
Alternative data representations are powerful tools that augment the
performance of downstream models. However, there is an abundance of such
representations within the machine learning toolbox, and the field lacks a
comparative understanding of the suitability of each representation method.
In this paper, we propose artifact detection and classification within EEG
data as a testbed for profiling image-based data representations of time series
data. We then evaluate eleven popular deep learning architectures on each of
six commonly-used representation methods.
We find that, while the choice of representation entails a choice within the
tradeoff between bias and variance, certain representations are practically
more effective in highlighting features which increase the signal-to-noise
ratio of the data. We present our results on EEG data, and open-source our
testing framework to enable future comparative analyses in this vein.
( 2
min )
This paper describes an architecture for predicting the price of
cryptocurrencies for the next seven days using the Adaptive Network Based Fuzzy
Inference System (ANFIS). Historical data of cryptocurrencies and indexes that
are considered are Bitcoin (BTC), Ethereum (ETH), Bitcoin Dominance (BTC.D),
and Ethereum Dominance (ETH.D) in a daily timeframe. The methods used to teach
the data are hybrid and backpropagation algorithms, as well as grid partition,
subtractive clustering, and Fuzzy C-means clustering (FCM) algorithms, which
are used in data clustering. The architectural performance designed in this
paper has been compared with different inputs and neural network models in
terms of statistical evaluation criteria. Finally, the proposed method can
predict the price of digital currencies in a short time.
( 2
min )
Spectral lightcurves consisting of time series single-pixel spectral
measurements of spacecraft are used to infer the spacecraft's attitude and
rotation. Two methods are used. One based on numerical optimisation of a
regularised least squares cost function, and another based on machine learning
with a neural network model. The aim is to work with minimal information, thus
no prior is available on the attitude nor on the inertia tensor. The
theoretical and practical aspects of this task are investigated, and the
methodology is tested on synthetic data. Results are shown based on synthetic
data.
( 2
min )
The field of antibody-based therapeutics has grown significantly in recent
years, with targeted antibodies emerging as a potentially effective approach to
personalized therapies. Such therapies could be particularly beneficial for
complex, highly individual diseases such as cancer. However, progress in this
field is often constrained by the extensive search space of amino acid
sequences that form the foundation of antibody design. In this study, we
introduce a novel reinforcement learning method specifically tailored to
address the unique challenges of this domain. We demonstrate that our method
can learn the design of high-affinity antibodies against multiple targets in
silico, utilizing either online interaction or offline datasets. To the best of
our knowledge, our approach is the first of its kind and outperforms existing
methods on all tested antigens in the Absolut! database.
( 2
min )
The advent of Generative AI has marked a significant milestone in artificial
intelligence, demonstrating remarkable capabilities in generating realistic
images, texts, and data patterns. However, these advancements come with
heightened concerns over data privacy and copyright infringement, primarily due
to the reliance on vast datasets for model training. Traditional approaches
like differential privacy, machine unlearning, and data poisoning only offer
fragmented solutions to these complex issues. Our paper delves into the
multifaceted challenges of privacy and copyright protection within the data
lifecycle. We advocate for integrated approaches that combines technical
innovation with ethical foresight, holistically addressing these concerns by
investigating and devising solutions that are informed by the lifecycle
perspective. This work aims to catalyze a broader discussion and inspire
concerted efforts towards data privacy and copyright integrity in Generative
AI.
( 2
min )
Automated Sleep stage classification using raw single channel EEG is a
critical tool for sleep quality assessment and disorder diagnosis. However,
modelling the complexity and variability inherent in this signal is a
challenging task, limiting their practicality and effectiveness in clinical
settings. To mitigate these challenges, this study presents an end-to-end deep
learning (DL) model which integrates squeeze and excitation blocks within the
residual network to extract features and stacked Bi-LSTM to understand complex
temporal dependencies. A distinctive aspect of this study is the adaptation of
GradCam for sleep staging, marking the first instance of an explainable DL
model in this domain with alignment of its decision-making with sleep expert's
insights. We evaluated our model on the publically available datasets
(SleepEDF-20, SleepEDF-78, and SHHS), achieving Macro-F1 scores of 82.5, 78.9,
and 81.9, respectively. Additionally, a novel training efficiency enhancement
strategy was implemented by increasing stride size, leading to 8x faster
training times with minimal impact on performance. Comparative analyses
underscore our model outperforms all existing baselines, indicating its
potential for clinical usage.
( 3
min )
Photoplethysmography (PPG) refers to the measurement of variations in blood
volume using light and is a feature of most wearable devices. The PPG signals
provide insight into the body's circulatory system and can be employed to
extract various bio-features, such as heart rate and vascular ageing. Although
several algorithms have been proposed for this purpose, many exhibit
limitations, including heavy reliance on human calibration, high signal quality
requirements, and a lack of generalisation. In this paper, we introduce a PPG
signal processing framework that integrates graph theory and computer vision
algorithms, to provide an analysis framework which is amplitude-independent and
invariant to affine transformations. It also requires minimal preprocessing,
fuses information through RGB channels and exhibits robust generalisation
across tasks and datasets. The proposed VGTL-net achieves state-of-the-art
performance in the prediction of vascular ageing and demonstrates robust
estimation of continuous blood pressure waveforms.
( 2
min )
For efficient neural network inference, it is desirable to achieve
state-of-the-art accuracy with the simplest networks requiring the least
computation, memory, and power. Quantizing networks to lower precision is a
powerful technique for simplifying networks. As each layer of a network may
have different sensitivity to quantization, mixed precision quantization
methods selectively tune the precision of individual layers to achieve a
minimum drop in task performance (e.g., accuracy). To estimate the impact of
layer precision choice on task performance, two methods are introduced: i)
Entropy Approximation Guided Layer selection (EAGL) is fast and uses the
entropy of the weight distribution, and ii) Accuracy-aware Layer Precision
Selection (ALPS) is straightforward and relies on single epoch fine-tuning
after layer precision reduction. Using EAGL and ALPS for layer precision
selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit
layers for ResNet-50, ResNet-101 and BERT-base transformer networks,
demonstrating enhanced performance across the entire accuracy-throughput
frontier. The techniques demonstrate better performance than existing
techniques in several commensurate comparisons. Notably, this is accomplished
with significantly lesser computational time required to reach a solution.
( 2
min )
The unfolding of detector effects is crucial for the comparison of data to
theory predictions. While traditional methods are limited to representing the
data in a low number of dimensions, machine learning has enabled new unfolding
techniques while retaining the full dimensionality. Generative networks like
invertible neural networks~(INN) enable a probabilistic unfolding, which map
individual events to their corresponding unfolded probability distribution. The
accuracy of such methods is however limited by how well simulated training
samples model the actual data that is unfolded. We introduce the iterative
conditional INN~(IcINN) for unfolding that adjusts for deviations between
simulated training samples and data. The IcINN unfolding is first validated on
toy data and then applied to pseudo-data for the $pp \to Z \gamma \gamma$
process.
( 2
min )
The paper shows that Physics-Informed Neural Networks (PINNs) can fail to
estimate the correct Partial Differential Equations (PDEs) dynamics in cases of
unknown changepoints in the parameters. To address this, we propose a new
CP-PINNs model which integrates PINNs with Total-Variation penalty for accurate
changepoints detection and PDEs discovery. In order to optimally combine the
tasks of model fitting, PDEs discovery, and changepoints detection, we develop
a new meta-learning algorithm that exploits batch learning to dynamically
refines the optimization objective when moving over the consecutive batches of
the data. Empirically, in case of changepoints in the dynamics, our approach
demonstrates accurate parameter estimation and model alignment, and in case of
no changepoints in the data, it converges numerically to the solution from the
original PINNs model.
( 2
min )
Features in images' backgrounds can spuriously correlate with the images'
classes, representing background bias. They can influence the classifier's
decisions, causing shortcut learning (Clever Hans effect). The phenomenon
generates deep neural networks (DNNs) that perform well on standard evaluation
datasets but generalize poorly to real-world data. Layer-wise Relevance
Propagation (LRP) explains DNNs' decisions. Here, we show that the optimization
of LRP heatmaps can minimize the background bias influence on deep classifiers,
hindering shortcut learning. By not increasing run-time computational cost, the
approach is light and fast. Furthermore, it applies to virtually any
classification architecture. After injecting synthetic bias in images'
backgrounds, we compared our approach (dubbed ISNet) to eight state-of-the-art
DNNs, quantitatively demonstrating its superior robustness to background bias.
Mixed datasets are common for COVID-19 and tuberculosis classification with
chest X-rays, fostering background bias. By focusing on the lungs, the ISNet
reduced shortcut learning. Thus, its generalization performance on external
(out-of-distribution) test databases significantly surpassed all implemented
benchmark models.
( 3
min )
Resistive memory is a promising alternative to SRAM, but is also an
inherently unstable device that requires substantial effort to ensure correct
read and write operations. To avoid the associated costs in terms of area, time
and energy, the present work is concerned with exploring how much noise in
memory operations can be tolerated by image classification tasks based on
neural networks. We introduce a special noisy operator that mimics the noise in
an exemplary resistive memory unit, explore the resilience of convolutional
neural networks on the CIFAR-10 classification task, and discuss a couple of
countermeasures to improve this resilience.
( 2
min )
Machine learning has emerged as a powerful solution to the modern challenges
in accelerator physics. However, the limited availability of beam time, the
computational cost of simulations, and the high-dimensionality of optimisation
problems pose significant challenges in generating the required data for
training state-of-the-art machine learning models. In this work, we introduce
Cheetah, a PyTorch-based high-speed differentiable linear-beam dynamics code.
Cheetah enables the fast collection of large data sets by reducing computation
times by multiple orders of magnitude and facilitates efficient gradient-based
optimisation for accelerator tuning and system identification. This positions
Cheetah as a user-friendly, readily extensible tool that integrates seamlessly
with widely adopted machine learning tools. We showcase the utility of Cheetah
through five examples, including reinforcement learning training,
gradient-based beamline tuning, gradient-based system identification,
physics-informed Bayesian optimisation priors, and modular neural network
surrogate modelling of space charge effects. The use of such a high-speed
differentiable simulation code will simplify the development of machine
learning-based methods for particle accelerators and fast-track their
integration into everyday operations of accelerator facilities.
( 2
min )
Heating, Ventilation, and Air Conditioning (HVAC) systems are a major driver
of energy consumption in commercial and residential buildings. Recent studies
have shown that Deep Reinforcement Learning (DRL) algorithms can outperform
traditional reactive controllers. However, DRL-based solutions are generally
designed for ad hoc setups and lack standardization for comparison. To fill
this gap, this paper provides a critical and reproducible evaluation, in terms
of comfort and energy consumption, of several state-of-the-art DRL algorithms
for HVAC control. The study examines the controllers' robustness, adaptability,
and trade-off between optimization goals by using the Sinergym framework. The
results obtained confirm the potential of DRL algorithms, such as SAC and TD3,
in complex scenarios and reveal several challenges related to generalization
and incremental learning.
( 2
min )
This article investigates the possibility to use the class entropy of the
output of a connectionist phoneme recogniser to predict time boundaries between
phonetic classes. The rationale is that the value of the entropy should
increase in proximity of a transition between two segments that are well
modelled (known) by the recognition network since it is a measure of
uncertainty. The advantage of this measure is its simplicity as the posterior
probabilities of each class are available in connectionist phoneme recognition.
The entropy and a number of measures based on differentiation of the entropy
are used in isolation and in combination. The decision methods for predicting
the boundaries range from simple thresholds to neural network based procedure.
The different methods are compared with respect to their precision, measured in
terms of the ratio between the number C of predicted boundaries within 10 or 20
msec of the reference and the total number of predicted boundaries, and recall,
measured as the ratio between C and the total number of reference boundaries.
( 2
min )
In this paper, we study the problem of estimating the normalizing constant
$\int e^{-\lambda f(x)}dx$ through queries to the black-box function $f$, where
$f$ belongs to a reproducing kernel Hilbert space (RKHS), and $\lambda$ is a
problem parameter. We show that to estimate the normalizing constant within a
small relative error, the level of difficulty depends on the value of
$\lambda$: When $\lambda$ approaches zero, the problem is similar to Bayesian
quadrature (BQ), while when $\lambda$ approaches infinity, the problem is
similar to Bayesian optimization (BO). More generally, the problem varies
between BQ and BO. We find that this pattern holds true even when the function
evaluations are noisy, bringing new aspects to this topic. Our findings are
supported by both algorithm-independent lower bounds and algorithmic upper
bounds, as well as simulation studies conducted on a variety of benchmark
functions.
( 2
min )
We study reinforcement learning in the presence of an unknown reward
perturbation. Existing methodologies for this problem make strong assumptions
including reward smoothness, known perturbations, and/or perturbations that do
not modify the optimal policy. We study the case of unknown arbitrary
perturbations that discretize and shuffle reward space, but have the property
that the true reward belongs to the most frequently observed class after
perturbation. This class of perturbations generalizes existing classes (and, in
the limit, all continuous bounded perturbations) and defeats existing methods.
We introduce an adaptive distributional reward critic and show theoretically
that it can recover the true rewards under technical conditions. Under the
targeted perturbation in discrete and continuous control tasks, we win/tie the
highest return in 40/57 settings (compared to 16/57 for the best baseline).
Even under the untargeted perturbation, we still win an edge over the baseline
designed especially for that setting.
( 2
min )
Control Barrier Functions (CBFs) provide an elegant framework for designing
safety filters for nonlinear control systems by constraining their trajectories
to an invariant subset of a prespecified safe set. However, the task of finding
a CBF that concurrently maximizes the volume of the resulting control invariant
set while accommodating complex safety constraints, particularly in high
relative degree systems with actuation constraints, continues to pose a
substantial challenge. In this work, we propose a novel self-supervised
learning framework that holistically addresses these hurdles. Given a Boolean
composition of multiple state constraints that define the safe set, our
approach starts with building a single continuously differentiable function
whose 0-superlevel set provides an inner approximation of the safe set. We then
use this function together with a smooth neural network to parameterize the CBF
candidate. Finally, we design a training loss function based on a
Hamilton-Jacobi partial differential equation to train the CBF while enlarging
the volume of the induced control invariant set. We demonstrate the
effectiveness of our approach via numerical experiments.
( 2
min )
Transfer learning (TL) is an increasingly popular approach to training deep
learning (DL) models that leverages the knowledge gained by training a
foundation model on diverse, large-scale datasets for use on downstream tasks
where less domain- or task-specific data is available. The literature is rich
with TL techniques and applications; however, the bulk of the research makes
use of deterministic DL models which are often uncalibrated and lack the
ability to communicate a measure of epistemic (model) uncertainty in
prediction. Unlike their deterministic counterparts, Bayesian DL (BDL) models
are often well-calibrated, provide access to epistemic uncertainty for a
prediction, and are capable of achieving competitive predictive performance. In
this study, we propose variational inference pre-trained audio neural networks
(VI-PANNs). VI-PANNs are a variational inference variant of the popular
ResNet-54 architecture which are pre-trained on AudioSet, a large-scale audio
event detection dataset. We evaluate the quality of the resulting uncertainty
when transferring knowledge from VI-PANNs to other downstream acoustic
classification tasks using the ESC-50, UrbanSound8K, and DCASE2013 datasets. We
demonstrate, for the first time, that it is possible to transfer calibrated
uncertainty information along with knowledge from upstream tasks to enhance a
model's capability to perform downstream tasks.
( 2
min )
The problem of data clustering is one of the most important in data analysis.
It can be problematic when dealing with experimental data characterized by
measurement uncertainties and errors. Our paper proposes a recursive scheme for
clustering data obtained in geographical (climatological) experiments. The
discussion of results obtained by k-means and SOM methods with the developed
recursive procedure is presented. We show that the clustering using the new
approach gives more acceptable results when compared to experts assessments.
( 2
min )
Graph neural networks (GNN) are a powerful tool for combining imaging and
non-imaging medical information for node classification tasks. Cross-network
node classification extends GNN techniques to account for domain drift,
allowing for node classification on an unlabeled target network. In this paper
we present OTGCN, a powerful, novel approach to cross-network node
classification. This approach leans on concepts from graph convolutional
networks to harness insights from graph data structures while simultaneously
applying strategies rooted in optimal transport to correct for the domain drift
that can occur between samples from different data collection sites. This
blended approach provides a practical solution for scenarios with many distinct
forms of data collected across different locations and equipment. We
demonstrate the effectiveness of this approach at classifying Autism Spectrum
Disorder subjects using a blend of imaging and non-imaging data.
( 2
min )
This paper introduces a new problem in the field of graph mining and social
network analysis called new node prediction. More technically, the task can be
categorized as zero-shot out-of-graph all-links prediction. This challenging
problem aims to predict all links from a new, isolated, and unobserved node
that was previously disconnected from the graph. Unlike classic approaches to
link prediction (including few-shot out-of-graph link prediction), this problem
presents two key differences: (1) the new node has no existing links from which
to extract patterns for new predictions; and (2) the goal is to predict not
just one, but all the links of this new node, or at least a significant part of
them. Experiments demonstrate that an architecture based on Deep Graph Neural
Networks can learn to solve this challenging problem in a bibliographic
citation network.
( 2
min )
We present a nonparametric method for outlier detection that takes full
account of local variations in intrinsic dimensionality within the dataset.
Using the theory of Local Intrinsic Dimensionality (LID), our
'dimensionality-aware' outlier detection method, DAO, is derived as an
estimator of an asymptotic local expected density ratio involving the query
point and a close neighbor drawn at random. The dimensionality-aware behavior
of DAO is due to its use of local estimation of LID values in a
theoretically-justified way. Through comprehensive experimentation on more than
800 synthetic and real datasets, we show that DAO significantly outperforms
three popular and important benchmark outlier detection methods: Local Outlier
Factor (LOF), Simplified LOF, and kNN.
( 2
min )
We propose two novel purpose-built deep learning (DL) models for synthesis of
the arterial blood pressure (ABP) waveform in a cuff-less manner, using a
single-site photoplethysmography (PPG) signal. We utilize the public UCI
dataset on cuff-less blood pressure (CLBP) estimation to train and evaluate our
DL models. Firstly, we implement a transformer model that incorporates
positional encoding, multi-head attention, layer normalization, and dropout
techniques, and synthesizes the ABP waveform with a mean absolute error (MAE)
of 14. Secondly, we implement a frequency-domain (FD) learning approach where
we first obtain the discrete cosine transform (DCT) coefficients of the PPG and
ABP signals corresponding to two cardiac cycles, and then learn a
linear/non-linear (L/NL) regression between them. We learn that the FD L/NL
regression model outperforms the transformer model by achieving an MAE of 11.87
and 8.01, for diastolic blood pressure (DBP) and systolic blood pressure (SBP),
respectively. Our FD L/NL regression model also fulfills the AAMI criterion of
utilizing data from more than 85 subjects, and achieves grade B by the BHS
criterion.
( 2
min )
The advent of compact, handheld devices has given us a pool of tracked
movement data that could be used to infer trends and patterns that can be made
to use. With this flooding of various trajectory data of animals, humans,
vehicles, etc., the idea of ANALYTiC originated, using active learning to infer
semantic annotations from the trajectories by learning from sets of labeled
data. This study explores the application of dimensionality reduction and
decision boundaries in combination with the already present active learning,
highlighting patterns and clusters in data. We test these features with three
different trajectory datasets with objective of exploiting the the already
labeled data and enhance their interpretability. Our experimental analysis
exemplifies the potential of these combined methodologies in improving the
efficiency and accuracy of trajectory labeling. This study serves as a
stepping-stone towards the broader integration of machine learning and visual
methods in context of movement data analysis.
( 2
min )
Financial data is generally time series in essence and thus suffers from
three fundamental issues: the mismatch in time resolution, the time-varying
property of the distribution - nonstationarity, and causal factors that are
important but unknown/unobserved. In this paper, we follow a causal perspective
to systematically look into these three demons in finance. Specifically, we
reexamine these issues in the context of causality, which gives rise to a novel
and inspiring understanding of how the issues can be addressed. Following this
perspective, we provide systematic solutions to these problems, which hopefully
would serve as a foundation for future research in the area.
( 2
min )
In this work, we propose a denoising diffusion generative model (DDGM)
trained with healthy electrocardiogram (ECG) data that focuses on ECG
morphology and inter-lead dependence. Our results show that this innovative
generative model can successfully generate realistic ECG signals. Furthermore,
we explore the application of recent breakthroughs in solving linear inverse
Bayesian problems using DDGM. This approach enables the development of several
important clinical tools. These include the calculation of corrected QT
intervals (QTc), effective noise suppression of ECG signals, recovery of
missing ECG leads, and identification of anomalous readings, enabling
significant advances in cardiac health monitoring and diagnosis.
( 2
min )
In automotive applications, frequency modulated continuous wave (FMCW) radar
is an established technology to determine the distance, velocity and angle of
objects in the vicinity of the vehicle. The quality of predictions might be
seriously impaired if mutual interference between radar sensors occurs.
Previous work processes data from the entire receiver array in parallel to
increase interference mitigation quality using neural networks (NNs). However,
these architectures do not generalize well across different angles of arrival
(AoAs) of interferences and objects. In this paper we introduce fully
convolutional neural network (CNN) with rank-three convolutions which is able
to transfer learned patterns between different AoAs. Our proposed architecture
outperforms previous work while having higher robustness and a lower number of
trainable parameters. We evaluate our network on a diverse data set and
demonstrate its angle equivariance.
( 2
min )
Accurate RNA secondary structure prediction is vital for understanding
cellular regulation and disease mechanisms. Deep learning (DL) methods have
surpassed traditional algorithms by predicting complex features like
pseudoknots and multi-interacting base pairs. However, traditional distance
measures can hardly deal with such tertiary interactions and the currently used
evaluation measures (F1 score, MCC) have limitations. We propose the
Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing
graph-based metrics like WL enables fair and accurate evaluation of RNA
structure prediction algorithms. Further, WL provides informative guidance, as
demonstrated in an RNA design experiment.
( 2
min )
Robustness certification, which aims to formally certify the predictions of
neural networks against adversarial inputs, has become an integral part of
important tool for safety-critical applications. Despite considerable progress,
existing certification methods are limited to elementary architectures, such as
convolutional networks, recurrent networks and recently Transformers, on
benchmark datasets such as MNIST. In this paper, we focus on the robustness
certification of scene text recognition (STR), which is a complex and
extensively deployed image-based sequence prediction problem. We tackle three
types of STR model architectures, including the standard STR pipelines and the
Vision Transformer. We propose STR-Cert, the first certification method for STR
models, by significantly extending the DeepPoly polyhedral verification
framework via deriving novel polyhedral bounds and algorithms for key STR model
components. Finally, we certify and compare STR models on six datasets,
demonstrating the efficiency and scalability of robustness certification,
particularly for the Vision Transformer.
( 2
min )
This study presents an unsupervised machine learning approach for optimizing
Profit and Loss (PnL) in quantitative finance. Our algorithm, akin to an
unsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL
generated from signals constructed linearly from exogenous variables. The
methodology employs a linear relationship between exogenous variables and the
trading signal, with the objective of maximizing the Sharpe Ratio through
parameter optimization. Empirical application on an ETF representing U.S.
Treasury bonds demonstrates the model's effectiveness, supported by
regularization techniques to mitigate overfitting. The study concludes with
potential avenues for further development, including generalized time steps and
enhanced corrective terms.
( 2
min )
The fairness of Natural Language Processing (NLP) models has emerged as a
crucial concern. Information theory indicates that to achieve fairness, a model
should not be able to predict sensitive variables, such as gender, ethnicity,
and age. However, information related to these variables often appears
implicitly in language, posing a challenge in identifying and mitigating biases
effectively. To tackle this issue, we present a novel approach that operates at
the embedding level of an NLP model, independent of the specific architecture.
Our method leverages insights from recent advances in XAI techniques and
employs an embedding transformation to eliminate implicit information from a
selected variable. By directly manipulating the embeddings in the final layer,
our approach enables a seamless integration into existing models without
requiring significant modifications or retraining. In evaluation, we show that
the proposed post-hoc approach significantly reduces gender-related
associations in NLP models while preserving the overall performance and
functionality of the models. An implementation of our method is available:
https://github.com/fanny-jourdan/TaCo
( 2
min )
The paper shows that Physics-Informed Neural Networks (PINNs) can fail to
estimate the correct Partial Differential Equations (PDEs) dynamics in cases of
unknown changepoints in the parameters. To address this, we propose a new
CP-PINNs model which integrates PINNs with Total-Variation penalty for accurate
changepoints detection and PDEs discovery. In order to optimally combine the
tasks of model fitting, PDEs discovery, and changepoints detection, we develop
a new meta-learning algorithm that exploits batch learning to dynamically
refines the optimization objective when moving over the consecutive batches of
the data. Empirically, in case of changepoints in the dynamics, our approach
demonstrates accurate parameter estimation and model alignment, and in case of
no changepoints in the data, it converges numerically to the solution from the
original PINNs model.
( 2
min )
In this paper, we study the problem of estimating the normalizing constant
$\int e^{-\lambda f(x)}dx$ through queries to the black-box function $f$, where
$f$ belongs to a reproducing kernel Hilbert space (RKHS), and $\lambda$ is a
problem parameter. We show that to estimate the normalizing constant within a
small relative error, the level of difficulty depends on the value of
$\lambda$: When $\lambda$ approaches zero, the problem is similar to Bayesian
quadrature (BQ), while when $\lambda$ approaches infinity, the problem is
similar to Bayesian optimization (BO). More generally, the problem varies
between BQ and BO. We find that this pattern holds true even when the function
evaluations are noisy, bringing new aspects to this topic. Our findings are
supported by both algorithm-independent lower bounds and algorithmic upper
bounds, as well as simulation studies conducted on a variety of benchmark
functions.
( 2
min )
In this work, we propose a denoising diffusion generative model (DDGM)
trained with healthy electrocardiogram (ECG) data that focuses on ECG
morphology and inter-lead dependence. Our results show that this innovative
generative model can successfully generate realistic ECG signals. Furthermore,
we explore the application of recent breakthroughs in solving linear inverse
Bayesian problems using DDGM. This approach enables the development of several
important clinical tools. These include the calculation of corrected QT
intervals (QTc), effective noise suppression of ECG signals, recovery of
missing ECG leads, and identification of anomalous readings, enabling
significant advances in cardiac health monitoring and diagnosis.
( 2
min )
“This year, every industry will become a technology industry,” NVIDIA founder and CEO Jensen Huang told attendees Wednesday during the annual J.P. Morgan Healthcare Conference. “You can now recognize and learn the language of almost anything with structure, and you can translate it to anything with structure — so text-protein, protein-text,” Huang said in a Read article >
( 6
min )
Enterprises have access to massive amounts of data, much of which is difficult to discover because the data is unstructured. Conventional approaches to analyzing unstructured data use keyword or synonym matching. They don’t capture the full context of a document, making them less effective in dealing with unstructured data. In contrast, text embeddings use machine […]
( 12
min )
The ability to accurately approximate trajectories of dynamical systems
enables their analysis, prediction, and control. Neural network (NN)-based
approximations have attracted significant interest due to fast evaluation with
good accuracy over long integration time steps. In contrast to established
numerical approximation schemes such as Runge-Kutta methods, the estimation of
the error of the NN-based approximations proves to be difficult. In this work,
we propose to use the NN's predictions in a high-order implicit Runge-Kutta
(IRK) method. The residuals in the implicit system of equations can be related
to the NN's prediction error, hence, we can provide an error estimate at
several points along a trajectory. We find that this error estimate highly
correlates with the NN's prediction error and that increasing the order of the
IRK method improves this estimate. We demonstrate this estimation methodology
for Physics-Informed Neural Network (PINNs) on the logistic equation as an
illustrative example and then apply it to a four-state electric generator model
that is regularly used in power system modelling.
( 2
min )
Bayesian Neural Networks (BayNNs) can inherently estimate predictive
uncertainty, facilitating informed decision-making. Dropout-based BayNNs are
increasingly implemented in spintronics-based computation-in-memory
architectures for resource-constrained yet high-performance safety-critical
applications. Although uncertainty estimation is important, the reliability of
Dropout generation and BayNN computation is equally important for target
applications but is overlooked in existing works. However, testing BayNNs is
significantly more challenging compared to conventional NNs, due to their
stochastic nature. In this paper, we present for the first time the model of
the non-idealities of the spintronics-based Dropout module and analyze their
impact on uncertainty estimates and accuracy. Furthermore, we propose a testing
framework based on repeatability ranking for Dropout-based BayNN with up to
$100\%$ fault coverage while using only $0.2\%$ of training data as test
vectors.
( 2
min )
The reliable diagnosis of cardiac conditions through electrocardiogram (ECG)
analysis critically depends on accurately detecting P waves and measuring the
PR interval. However, achieving consistent and generalizable diagnoses across
diverse populations presents challenges due to the inherent global variations
observed in ECG signals. This paper is focused on applying the Q learning
reinforcement algorithm to the various ECG datasets available in the
PhysioNet/Computing in Cardiology Challenge (CinC). Five ECG beats, including
Normal Sinus Rhythm, Atrial Flutter, Atrial Fibrillation, 1st Degree
Atrioventricular Block, and Left Atrial Enlargement, are included to study
variations of P waves and PR Interval on Lead II and Lead V1. Q-Agent
classified 71,672 beat samples in 8,867 patients with an average accuracy of
90.4% and only 9.6% average hamming loss over misclassification. The average
classification time at the 100th episode containing around 40,000 samples is
0.04 seconds. An average training reward of 344.05 is achieved at an alpha,
gamma, and SoftMax temperature rate of 0.001, 0.9, and 0.1, respectively.
( 2
min )
We propose a novel high-performance, interpretable, and parameter \&
computationally efficient deep learning architecture for tabular data, Gated
Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF
relies on a new tabular processing unit with a gating mechanism and in-built
feature selection called Gated Feature Learning Unit (GFLU) as a feature
representation learning unit. We demonstrate that GANDALF outperforms or stays
at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by
experiments on multiple established public benchmarks. We have made available
the code at github.com/manujosephv/pytorch_tabular under MIT License.
( 2
min )
In this paper, we consider a wireless network of smart sensors (agents) that
monitor a dynamical process and send measurements to a base station that
performs global monitoring and decision-making. Smart sensors are equipped with
both sensing and computation, and can either send raw measurements or process
them prior to transmission. Constrained agent resources raise a fundamental
latency-accuracy trade-off. On the one hand, raw measurements are inaccurate
but fast to produce. On the other hand, data processing on resource-constrained
platforms generates accurate measurements at the cost of non-negligible
computation latency. Further, if processed data are also compressed, latency
caused by wireless communication might be higher for raw measurements. Hence,
it is challenging to decide when and where sensors in the network should
transmit raw measurements or leverage time-consuming local processing. To
tackle this design problem, we propose a Reinforcement Learning approach to
learn an efficient policy that dynamically decides when measurements are to be
processed at each sensor. Effectiveness of our proposed approach is validated
through a numerical simulation with case study on smart sensing motivated by
the Internet of Drones.
( 3
min )
Models trained with empirical risk minimization (ERM) are known to learn to
rely on spurious features, i.e., their prediction is based on undesired
auxiliary features which are strongly correlated with class labels but lack
causal reasoning. This behavior particularly degrades accuracy in groups of
samples of the correlated class that are missing the spurious feature or
samples of the opposite class but with the spurious feature present. The
recently proposed Deep Feature Reweighting (DFR) method improves accuracy of
these worst groups. Based on the main argument that ERM mods can learn core
features sufficiently well, DFR only needs to retrain the last layer of the
classification model with a small group-balanced data set. In this work, we
examine the applicability of DFR to realistic data in the medical domain.
Furthermore, we investigate the reasoning behind the effectiveness of
last-layer retraining and show that even though DFR has the potential to
improve the accuracy of the worst group, it remains susceptible to spurious
correlations.
( 2
min )
We propose a new training algorithm, named DualFL (Dualized Federated
Learning), for solving distributed optimization problems in federated learning.
DualFL achieves communication acceleration for very general convex cost
functions, thereby providing a solution to an open theoretical problem in
federated learning concerning cost functions that may not be smooth nor
strongly convex. We provide a detailed analysis for the local iteration
complexity of DualFL to ensure the overall computational efficiency of DualFL.
Furthermore, we introduce a completely new approach for the convergence
analysis of federated learning based on a dual formulation. This new technique
enables concise and elegant analysis, which contrasts the complex calculations
used in existing literature on convergence of federated learning algorithms.
( 2
min )
Effective disaster response is critical for affected communities. Responders
and decision-makers would benefit from reliable, timely measures of the issues
impacting their communities during a disaster, and social media offers a
potentially rich data source. Social media can reflect public concerns and
demands during a disaster, offering valuable insights for decision-makers to
understand evolving situations and optimize resource allocation. We used
Bidirectional Encoder Representations from Transformers (BERT) topic modeling
to cluster topics from Twitter data. Then, we conducted a temporal-spatial
analysis to examine the distribution of these topics across different regions
during the 2020 western U.S. wildfire season. Our results show that Twitter
users mainly focused on three topics:"health impact," "damage," and
"evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to
explore the magnitude and velocity of topic diffusion on Twitter. The results
displayed a clear relationship between topic trends and wildfire propagation
patterns. The estimated parameters obtained from the SIR model in selected
cities revealed that residents exhibited a high level of several concerns
during the wildfire. Our study details how the SIR model and topic modeling
using social media data can provide decision-makers with a quantitative
approach to measure disaster response and support their decision-making
processes.
( 3
min )
Monotone missing data is a common problem in data analysis. However,
imputation combined with dimensionality reduction can be computationally
expensive, especially with the increasing size of datasets. To address this
issue, we propose a Blockwise principal component analysis Imputation (BPI)
framework for dimensionality reduction and imputation of monotone missing data.
The framework conducts Principal Component Analysis (PCA) on the observed part
of each monotone block of the data and then imputes on merging the obtained
principal components using a chosen imputation technique. BPI can work with
various imputation techniques and can significantly reduce imputation time
compared to conducting dimensionality reduction after imputation. This makes it
a practical and efficient approach for large datasets with monotone missing
data. Our experiments validate the improvement in speed. In addition, our
experiments also show that while applying MICE imputation directly on missing
data may not yield convergence, applying BPI with MICE for the data may lead to
convergence.
( 2
min )
In continual learning (CL), an AI agent (e.g., autonomous vehicles or
robotics) learns from non-stationary data streams under dynamic environments.
For the practical deployment of such applications, it is important to guarantee
robustness to unseen environments while maintaining past experiences. In this
paper, a novel CL framework is proposed to achieve robust generalization to
dynamic environments while retaining past knowledge. The considered CL agent
uses a capacity-limited memory to save previously observed environmental
information to mitigate forgetting issues. Then, data points are sampled from
the memory to estimate the distribution of risks over environmental change so
as to obtain predictors that are robust with unseen changes. The generalization
and memorization performance of the proposed framework are theoretically
analyzed. This analysis showcases the tradeoff between memorization and
generalization with the memory size. Experiments show that the proposed
algorithm outperforms memory-based CL baselines across all environments while
significantly improving the generalization performance on unseen target
environments.
( 2
min )
In this paper, we design a real-time question-answering system specifically
targeted for helping sellers get relevant material/documentation they can share
live with their customers or refer to during a call. Taking the Seismic content
repository as a relatively large scale example of a diverse dataset of sales
material, we demonstrate how LLM embeddings of sellers' queries can be matched
with the relevant content. We achieve this by engineering prompts in an
elaborate fashion that makes use of the rich set of meta-features available for
documents and sellers. Using a bi-encoder with cross-encoder re-ranker
architecture, we show how the solution returns the most relevant content
recommendations in just a few seconds even for large datasets. Our recommender
system is deployed as an AML endpoint for real-time inferencing and has been
integrated into a Copilot interface that is now deployed in the production
version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers.
( 2
min )
Mobile autonomy relies on the precise perception of dynamic environments.
Robustly tracking moving objects in 3D world thus plays a pivotal role for
applications like trajectory prediction, obstacle avoidance, and path planning.
While most current methods utilize LiDARs or cameras for Multiple Object
Tracking (MOT), the capabilities of 4D imaging radars remain largely
unexplored. Recognizing the challenges posed by radar noise and point sparsity
in 4D radar data, we introduce RaTrack, an innovative solution tailored for
radar-based tracking. Bypassing the typical reliance on specific object types
and 3D bounding boxes, our method focuses on motion segmentation and
clustering, enriched by a motion estimation module. Evaluated on the
View-of-Delft dataset, RaTrack showcases superior tracking precision of moving
objects, largely surpassing the performance of the state of the art.
( 2
min )
To investigate the processing of speech in the brain, simple linear models
are commonly used to establish a relationship between brain signals and speech
features. However, these linear models are ill-equipped to model a highly
dynamic and complex non-linear system like the brain. Although non-linear
methods with neural networks have been developed recently, reconstructing
unseen stimuli from unseen subjects' EEG is still a highly challenging task.
This work presents a novel method, ConvConcatNet, to reconstruct mel-specgrams
from EEG, in which the deep convolution neural network and extensive
concatenation operation were combined. With our ConvConcatNet model, the
Pearson correlation between the reconstructed and the target mel-spectrogram
can achieve 0.0420, which was ranked as No.1 in the Task 2 of the Auditory EEG
Challenge. The codes and models to implement our work will be available on
Github: https://github.com/xuxiran/ConvConcatNet
( 2
min )
To address the possible lack or total absence of pulses from particle
detectors during the development of its associate electronics, we propose a
model that can generate them without losing the features of the real ones. This
model is based on artificial neural networks, namely Generative Adversarial
Networks (GAN). We describe the proposed network architecture, its training
methodology and the approach to train the GAN with real pulses from a
scintillator receiving radiation from sources of ${}^{137}$Cs and ${}^{22}$Na.
The Generator was installed in a Xilinx's System-On-Chip (SoC). We show how the
network is capable of generating pulses with the same shape as the real ones
that even match the data distributions in the original pulse-height histogram
data.
( 2
min )
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS
acoustic modelling, trained using optimal-transport conditional flow matching
(OT-CFM). This yields an ODE-based decoder capable of high output quality in
fewer synthesis steps than models trained using score matching. Careful design
choices additionally ensure each synthesis step is fast to run. The method is
probabilistic, non-autoregressive, and learns to speak from scratch without
external alignments. Compared to strong pre-trained baseline models, the
Matcha-TTS system has the smallest memory footprint, rivals the speed of the
fastest models on long utterances, and attains the highest mean opinion score
in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for
audio examples, code, and pre-trained models.
( 2
min )
Transformer requires a fixed number of layers and heads which makes them
inflexible to the complexity of individual samples and expensive in training
and inference. To address this, we propose a sample-based Dynamic Hierarchical
Transformer (DHT) model whose layers and heads can be dynamically configured
with single data samples via solving contextual bandit problems. To determine
the number of layers and heads, we use the Uniform Confidence Bound while we
deploy combinatorial Thompson Sampling in order to select specific head
combinations given their number. Different from previous work that focuses on
compressing trained networks for inference only, DHT is not only advantageous
for adaptively optimizing the underlying network architecture during training
but also has a flexible network for efficient inference. To the best of our
knowledge, this is the first comprehensive data-driven dynamic transformer
without any additional auxiliary neural networks that implement the dynamic
system. According to the experiment results, we achieve up to 74% computational
savings for both training and inference with a minimal loss of accuracy.
( 3
min )
We introduce a novel framework for analyzing reinforcement learning (RL) in
continuous state-action spaces, and use it to prove fast rates of convergence
in both off-line and on-line settings. Our analysis highlights two key
stability properties, relating to how changes in value functions and/or
policies affect the Bellman operator and occupation measures. We argue that
these properties are satisfied in many continuous state-action Markov decision
processes, and demonstrate how they arise naturally when using linear function
approximation methods. Our analysis offers fresh perspectives on the roles of
pessimism and optimism in off-line and on-line RL, and highlights the
connection between off-line RL and transfer learning.
( 2
min )
Digital image correlation (DIC) has become a valuable tool to monitor and
evaluate mechanical experiments of cracked specimen, but the automatic
detection of cracks is often difficult due to inherent noise and artefacts.
Machine learning models have been extremely successful in detecting crack paths
and crack tips using DIC-measured, interpolated full-field displacements as
input to a convolution-based segmentation model. Still, big data is needed to
train such models. However, scientific data is often scarce as experiments are
expensive and time-consuming. In this work, we present a method to directly
generate large amounts of artificial displacement data of cracked specimen
resembling real interpolated DIC displacements. The approach is based on
generative adversarial networks (GANs). During training, the discriminator
receives physical domain knowledge in the form of the derived von Mises
equivalent strain. We show that this physics-guided approach leads to improved
results in terms of visual quality of samples, sliced Wasserstein distance, and
geometry score when compared to a classical unguided GAN approach.
( 2
min )
We propose a novel method for privacy-preserving deep neural networks (DNNs)
with the Vision Transformer (ViT). The method allows us not only to train
models and test with visually protected images but to also avoid the
performance degradation caused from the use of encrypted images, whereas
conventional methods cannot avoid the influence of image encryption. A domain
adaptation method is used to efficiently fine-tune ViT with encrypted images.
In experiments, the method is demonstrated to outperform conventional methods
in an image classification task on the CIFAR-10 and ImageNet datasets in terms
of classification accuracy.
( 2
min )
Federated learning (FL) is a promising technology via which some edge
devices/clients collaboratively train a machine learning model orchestrated by
a server. Learning an unfair model is known as a critical problem in federated
learning, where the trained model may unfairly advantage or disadvantage some
of the devices. To tackle this problem, in this work, we propose AdaFed. The
goal of AdaFed is to find an updating direction for the server along which (i)
all the clients' loss functions are decreasing; and (ii) more importantly, the
loss functions for the clients with larger values decrease with a higher rate.
AdaFed adaptively tunes this common direction based on the values of local
gradients and loss functions. We validate the effectiveness of AdaFed on a
suite of federated datasets, and demonstrate that AdaFed outperforms
state-of-the-art fair FL methods.
( 2
min )
Pandemics, notably the recent COVID-19 outbreak, have impacted both public
health and the global economy. A profound understanding of disease progression
and efficient response strategies is thus needed to prepare for potential
future outbreaks. In this paper, we emphasize the potential of Agent-Based
Models (ABM) in capturing complex infection dynamics and understanding the
impact of interventions. We simulate realistic pharmaceutical, behavioral, and
digital interventions that mirror challenges in real-world policy adoption and
suggest a holistic combination of these interventions for pandemic response.
Using these simulations, we study the trends of emergent behavior on a
large-scale population based on real-world socio-demographic and geo-census
data from Kings County in Washington. Our analysis reveals the pivotal role of
the initial 100 days in dictating a pandemic's course, emphasizing the
importance of quick decision-making and efficient policy development. Further,
we highlight that investing in behavioral and digital interventions can reduce
the burden on pharmaceutical interventions by reducing the total number of
infections and hospitalizations, and by delaying the pandemic's peak. We also
infer that allocating the same amount of dollars towards extensive testing with
contact tracing and self-quarantine offers greater cost efficiency compared to
spending the entire budget on vaccinations.
( 3
min )
Improving energy efficiency in industrial production processes is crucial for
competitiveness, and compliance with climate policies. This paper introduces a
data-driven approach to identify optimal melting patterns in induction
furnaces. Through time-series K-means clustering the melting patterns could be
classified into distinct clusters based on temperature profiles. Using the
elbow method, 12 clusters were identified, representing the range of melting
patterns. Performance parameters such as melting time, energy-specific
performance, and carbon cost were established for each cluster, indicating
furnace efficiency and environmental impact. Multiple criteria decision-making
methods including Simple Additive Weighting, Multiplicative Exponential
Weighting, Technique for Order of Preference by Similarity to Ideal Solution,
modified TOPSIS, and VlseKriterijumska Optimizacija I Kompromisno Resenje were
utilized to determine the best-practice cluster. The study successfully
identified the cluster with the best performance. Implementing the best
practice operation resulted in an 8.6 % reduction in electricity costs,
highlighting the potential energy savings in the foundry.
( 2
min )
This Letter introduces an approach for precisely designing surface friction
properties using a conditional generative machine learning model, specifically
a diffusion denoising probabilistic model (DDPM). We created a dataset of
synthetic surfaces with frictional properties determined by molecular dynamics
simulations, which trained the DDPM to predict surface structures from desired
frictional outcomes. Unlike traditional trial-and-error and numerical
optimization methods, our approach directly yields surface designs meeting
specified frictional criteria with high accuracy and efficiency. This
advancement in material surface engineering demonstrates the potential of
machine learning in reducing the iterative nature of surface design processes.
Our findings not only provide a new pathway for precise surface property
tailoring but also suggest broader applications in material science where
surface characteristics are critical.
( 2
min )
Cooperative multi-agent reinforcement learning is a powerful tool to solve
many real-world cooperative tasks, but restrictions of real-world applications
may require training the agents in a fully decentralized manner. Due to the
lack of information about other agents, it is challenging to derive algorithms
that can converge to the optimal joint policy in a fully decentralized setting.
Thus, this research area has not been thoroughly studied. In this paper, we
seek to systematically review the fully decentralized methods in two settings:
maximizing a shared reward of all agents and maximizing the sum of individual
rewards of all agents, and discuss open questions and future research
directions.
( 2
min )
This paper proposes a classification framework aimed at identifying
correlations between job ad requirements and transversal skill sets, with a
focus on predicting the necessary skills for individual job descriptions using
a deep learning model. The approach involves data collection, preprocessing,
and labeling using ESCO (European Skills, Competences, and Occupations)
taxonomy. Hierarchical classification and multi-label strategies are used for
skill identification, while augmentation techniques address data imbalance,
enhancing model robustness. A comparison between results obtained with
English-specific and multi-language sentence embedding models reveals close
accuracy. The experimental case studies detail neural network configurations,
hyperparameters, and cross-validation results, highlighting the efficacy of the
hierarchical approach and the suitability of the multi-language model for the
diverse European job market. Thus, a new approach is proposed for the
hierarchical classification of transversal skills from job ads.
( 2
min )
Federated learning (FL) is an emerging paradigm for decentralized training of
machine learning models on distributed clients, without revealing the data to
the central server. The learning scheme may be horizontal, vertical or hybrid
(both vertical and horizontal). Most existing research work with deep neural
network (DNN) modelling is focused on horizontal data distributions, while
vertical and hybrid schemes are much less studied. In this paper, we propose a
generalized algorithm FedEmb, for modelling vertical and hybrid DNN-based
learning. The idea of our algorithm is characterised by higher inference
accuracy, stronger privacy-preserving properties, and lower client-server
communication bandwidth demands as compared with existing work. The
experimental results show that FedEmb is an effective method to tackle both
split feature & subject space decentralized problems, shows 0.3% to 4.2%
inference accuracy improvement with limited privacy revealing for datasets
stored in local clients, and reduces 88.9 % time complexity over vertical
baseline method.
( 3
min )
Algorithmic reproducibility measures the deviation in outputs of machine
learning algorithms upon minor changes in the training process. Previous work
suggests that first-order methods would need to trade-off convergence rate
(gradient complexity) for better reproducibility. In this work, we challenge
this perception and demonstrate that both optimal reproducibility and
near-optimal convergence guarantees can be achieved for smooth convex
minimization and smooth convex-concave minimax problems under various
error-prone oracle settings. Particularly, given the inexact initialization
oracle, our regularization-based algorithms achieve the best of both worlds -
optimal reproducibility and near-optimal gradient complexity - for minimization
and minimax optimization. With the inexact gradient oracle, the near-optimal
guarantees also hold for minimax optimization. Additionally, with the
stochastic gradient oracle, we show that stochastic gradient descent ascent is
optimal in terms of both reproducibility and gradient complexity. We believe
our results contribute to an enhanced understanding of the
reproducibility-convergence trade-off in the context of convex optimization.
( 2
min )
We consider the sequential decision-making problem where the mean outcome is
a non-linear function of the chosen action. Compared with the linear model, two
curious phenomena arise in non-linear models: first, in addition to the
"learning phase" with a standard parametric rate for estimation or regret,
there is an "burn-in period" with a fixed cost determined by the non-linear
function; second, achieving the smallest burn-in cost requires new exploration
algorithms. For a special family of non-linear functions named ridge functions
in the literature, we derive upper and lower bounds on the optimal burn-in
cost, and in addition, on the entire learning trajectory during the burn-in
period via differential equations. In particular, a two-stage algorithm that
first finds a good initial action and then treats the problem as locally linear
is statistically optimal. In contrast, several classical algorithms, such as
UCB and algorithms relying on regression oracles, are provably suboptimal.
( 2
min )
We develop a new framework for embedding joint probability distributions in
tensor product reproducing kernel Hilbert spaces (RKHS). Our framework
accommodates a low-dimensional, normalized and positive model of a
Radon-Nikodym derivative, which we estimate from sample sizes of up to several
million data points, alleviating the inherent limitations of RKHS modeling.
Well-defined normalized and positive conditional distributions are natural
by-products to our approach. The embedding is fast to compute and accommodates
learning problems ranging from prediction to classification. Our theoretical
findings are supplemented by favorable numerical results.
( 2
min )
We propose a hierarchical correlation clustering method that extends the
well-known correlation clustering to produce hierarchical clusters applicable
to both positive and negative pairwise dissimilarities. Then, in the following,
we study unsupervised representation learning with such hierarchical
correlation clustering. For this purpose, we first investigate embedding the
respective hierarchy to be used for tree-preserving embedding and feature
extraction. Thereafter, we study the extension of minimax distance measures to
correlation clustering, as another representation learning paradigm. Finally,
we demonstrate the performance of our methods on several datasets.
( 2
min )
Machine learning models typically focus on specific targets like creating
classifiers, often based on known population feature distributions in a
business context. However, models calculating individual features adapt over
time to improve precision, introducing the concept of decoupling: shifting from
point evaluation to data distribution. We use calibration strategies as
strategy for decoupling machine learning (ML) classifiers from score-based
actions within business logic frameworks. To evaluate these strategies, we
perform a comparative analysis using a real-world business scenario and
multiple ML models. Our findings highlight the trade-offs and performance
implications of the approach, offering valuable insights for practitioners
seeking to optimize their decoupling efforts. In particular, the Isotonic and
Beta calibration methods stand out for scenarios in which there is shift
between training and testing data.
( 2
min )
The zero-shot text-to-speech (TTS) method, based on speaker embeddings
extracted from reference speech using self-supervised learning (SSL) speech
representations, can reproduce speaker characteristics very accurately.
However, this approach suffers from degradation in speech synthesis quality
when the reference speech contains noise. In this paper, we propose a
noise-robust zero-shot TTS method. We incorporated adapters into the SSL model,
which we fine-tuned with the TTS model using noisy reference speech. In
addition, to further improve performance, we adopted a speech enhancement (SE)
front-end. With these improvements, our proposed SSL-based zero-shot TTS
achieved high-quality speech synthesis with noisy reference speech. Through the
objective and subjective evaluations, we confirmed that the proposed method is
highly robust to noise in reference speech, and effectively works in
combination with SE.
( 2
min )
Uncertainty estimation is increasingly attractive for improving the
reliability of neural networks. In this work, we present novel credal-set
interval neural networks (CreINNs) designed for classification tasks. CreINNs
preserve the traditional interval neural network structure, capturing weight
uncertainty through deterministic intervals, while forecasting credal sets
using the mathematical framework of probability intervals. Experimental
validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN)
showcase that CreINNs outperform epistemic uncertainty estimation when compared
to variational Bayesian neural networks (BNNs) and deep ensembles (DEs).
Furthermore, CreINNs exhibit a notable reduction in computational complexity
compared to variational BNNs and demonstrate smaller model sizes than DEs.
( 2
min )
To handle the complexities of irregular and incomplete time series data, we
propose an invertible solution of Neural Differential Equations (NDE)-based
method. While NDE-based methods are a powerful method for analyzing
irregularly-sampled time series, they typically do not guarantee reversible
transformations in their standard form. Our method suggests the variation of
Neural Controlled Differential Equations (Neural CDEs) with Neural Flow, which
ensures invertibility while maintaining a lower computational burden.
Additionally, it enables the training of a dual latent space, enhancing the
modeling of dynamic temporal dynamics. Our research presents an advanced
framework that excels in both classification and interpolation tasks. At the
core of our approach is an enhanced dual latent states architecture, carefully
designed for high precision across various time series tasks. Empirical
analysis demonstrates that our method significantly outperforms existing
models. This work significantly advances irregular time series analysis,
introducing innovative techniques and offering a versatile tool for diverse
practical applications.
( 2
min )
I introduce a unified framework for interpreting neural network classifiers
tailored toward automated scientific discovery. In contrast to neural
network-based regression, for classification, it is in general impossible to
find a one-to-one mapping from the neural network to a symbolic equation even
if the neural network itself bases its classification on a quantity that can be
written as a closed-form equation. In this paper, I embed a trained neural
network into an equivalence class of classifying functions that base their
decisions on the same quantity. I interpret neural networks by finding an
intersection between this equivalence class and human-readable equations
defined by the search space of symbolic regression. The approach is not limited
to classifiers or full neural networks and can be applied to arbitrary neurons
in hidden layers or latent spaces or to simplify the process of interpreting
neural network regressors.
( 2
min )
In this work, we provide a simulation algorithm to simulate from a
(multivariate) characteristic function, which is only accessible in a black-box
format. We construct a generative neural network, whose loss function exploits
a specific representation of the Maximum-Mean-Discrepancy metric to directly
incorporate the targeted characteristic function. The construction is universal
in the sense that it is independent of the dimension and that it does not
require any assumptions on the given characteristic function. Furthermore,
finite sample guarantees on the approximation quality in terms of the
Maximum-Mean Discrepancy metric are derived. The method is illustrated in a
short simulation study.
( 2
min )
Skin cancer is a global health concern, necessitating early and accurate
diagnosis for improved patient outcomes. This study introduces a groundbreaking
approach to skin cancer classification, employing the Vision Transformer, a
state-of-the-art deep learning architecture renowned for its success in diverse
image analysis tasks. Utilizing the HAM10000 dataset of 10,015 meticulously
annotated skin lesion images, the model undergoes preprocessing for enhanced
robustness. The Vision Transformer, adapted to the skin cancer classification
task, leverages the self-attention mechanism to capture intricate spatial
dependencies, achieving superior performance over traditional deep learning
architectures. Segment Anything Model aids in precise segmentation of cancerous
areas, attaining high IOU and Dice Coefficient. Extensive experiments highlight
the model's supremacy, particularly the Google-based ViT patch-32 variant,
which achieves 96.15% accuracy and showcases potential as an effective tool for
dermatologists in skin cancer diagnosis, contributing to advancements in
dermatological practices.
( 2
min )
We consider the sequential decision-making problem where the mean outcome is
a non-linear function of the chosen action. Compared with the linear model, two
curious phenomena arise in non-linear models: first, in addition to the
"learning phase" with a standard parametric rate for estimation or regret,
there is an "burn-in period" with a fixed cost determined by the non-linear
function; second, achieving the smallest burn-in cost requires new exploration
algorithms. For a special family of non-linear functions named ridge functions
in the literature, we derive upper and lower bounds on the optimal burn-in
cost, and in addition, on the entire learning trajectory during the burn-in
period via differential equations. In particular, a two-stage algorithm that
first finds a good initial action and then treats the problem as locally linear
is statistically optimal. In contrast, several classical algorithms, such as
UCB and algorithms relying on regression oracles, are provably suboptimal.
( 2
min )
We propose a hierarchical correlation clustering method that extends the
well-known correlation clustering to produce hierarchical clusters applicable
to both positive and negative pairwise dissimilarities. Then, in the following,
we study unsupervised representation learning with such hierarchical
correlation clustering. For this purpose, we first investigate embedding the
respective hierarchy to be used for tree-preserving embedding and feature
extraction. Thereafter, we study the extension of minimax distance measures to
correlation clustering, as another representation learning paradigm. Finally,
we demonstrate the performance of our methods on several datasets.
( 2
min )
Observational cohort studies are increasingly being used for comparative
effectiveness research to assess the safety of therapeutics. Recently, various
doubly robust methods have been proposed for average treatment effect
estimation by combining the treatment model and the outcome model via different
vehicles, such as matching, weighting, and regression. The key advantage of
doubly robust estimators is that they require either the treatment model or the
outcome model to be correctly specified to obtain a consistent estimator of
average treatment effects, and therefore lead to a more accurate and often more
precise inference. However, little work has been done to understand how doubly
robust estimators differ due to their unique strategies of using the treatment
and outcome models and how machine learning techniques can be combined to boost
their performance. Here we examine multiple popular doubly robust methods and
compare their performance using different treatment and outcome modeling via
extensive simulations and a real-world application. We found that incorporating
machine learning with doubly robust estimators such as the targeted maximum
likelihood estimator gives the best overall performance. Practical guidance on
how to apply doubly robust estimators is provided.
( 3
min )
We introduce a novel framework for analyzing reinforcement learning (RL) in
continuous state-action spaces, and use it to prove fast rates of convergence
in both off-line and on-line settings. Our analysis highlights two key
stability properties, relating to how changes in value functions and/or
policies affect the Bellman operator and occupation measures. We argue that
these properties are satisfied in many continuous state-action Markov decision
processes, and demonstrate how they arise naturally when using linear function
approximation methods. Our analysis offers fresh perspectives on the roles of
pessimism and optimism in off-line and on-line RL, and highlights the
connection between off-line RL and transfer learning.
( 2
min )
Algorithmic reproducibility measures the deviation in outputs of machine
learning algorithms upon minor changes in the training process. Previous work
suggests that first-order methods would need to trade-off convergence rate
(gradient complexity) for better reproducibility. In this work, we challenge
this perception and demonstrate that both optimal reproducibility and
near-optimal convergence guarantees can be achieved for smooth convex
minimization and smooth convex-concave minimax problems under various
error-prone oracle settings. Particularly, given the inexact initialization
oracle, our regularization-based algorithms achieve the best of both worlds -
optimal reproducibility and near-optimal gradient complexity - for minimization
and minimax optimization. With the inexact gradient oracle, the near-optimal
guarantees also hold for minimax optimization. Additionally, with the
stochastic gradient oracle, we show that stochastic gradient descent ascent is
optimal in terms of both reproducibility and gradient complexity. We believe
our results contribute to an enhanced understanding of the
reproducibility-convergence trade-off in the context of convex optimization.
( 2
min )
In this work, we provide a simulation algorithm to simulate from a
(multivariate) characteristic function, which is only accessible in a black-box
format. We construct a generative neural network, whose loss function exploits
a specific representation of the Maximum-Mean-Discrepancy metric to directly
incorporate the targeted characteristic function. The construction is universal
in the sense that it is independent of the dimension and that it does not
require any assumptions on the given characteristic function. Furthermore,
finite sample guarantees on the approximation quality in terms of the
Maximum-Mean Discrepancy metric are derived. The method is illustrated in a
short simulation study.
( 2
min )
We develop a new framework for embedding joint probability distributions in
tensor product reproducing kernel Hilbert spaces (RKHS). Our framework
accommodates a low-dimensional, normalized and positive model of a
Radon-Nikodym derivative, which we estimate from sample sizes of up to several
million data points, alleviating the inherent limitations of RKHS modeling.
Well-defined normalized and positive conditional distributions are natural
by-products to our approach. The embedding is fast to compute and accommodates
learning problems ranging from prediction to classification. Our theoretical
findings are supplemented by favorable numerical results.
( 2
min )
The PGA TOUR continues to enhance the golf experience with real-time data that brings fans closer to the game. To deliver even richer experiences, they are pursuing the development of a next-generation ball position tracking system that automatically tracks the position of the ball on the green. The TOUR currently uses ShotLink powered by CDW, […]
( 9
min )
AI-backed virtual assistants face challenges in handling complex data structures. TaskWeaver helps users build assistants that understand diverse domain questions, follow examples, and efficiently execute customizable algorithms on complex data structures.
The post TaskWeaver: A code-first agent framework for efficient data analytics and domain adaptation appeared first on Microsoft Research.
( 12
min )
The financial services industry is undergoing a significant transformation with the adoption of AI technologies. NVIDIA’s fourth annual State of AI in Financial Services Report provides insights into the current landscape and emerging trends for 2024. The report reveals that an overwhelming 91% of financial services companies are either assessing AI or already using it Read article >
( 7
min )
GFN Thursday recaps the latest cloud announcements from CES 2024 — Day Pass memberships, Cloud G-SYNC technology, expanded NVIDIA Reflex support and more. The new year brings new adventures to the cloud for members, including Diablo IV and Overwatch 2 from Blizzard, Exoprimal from Capcom, Honkai: Star Rail from HoYoverse and Pax Dei from Mainframe Read article >
( 7
min )
In this paper we present a method for single-channel wind noise reduction
using our previously proposed diffusion-based stochastic regeneration model
combining predictive and generative modelling. We introduce a non-additive
speech in noise model to account for the non-linear deformation of the membrane
caused by the wind flow and possible clipping. We show that our stochastic
regeneration model outperforms other neural-network-based wind noise reduction
methods as well as purely predictive and generative models, on a dataset using
simulated and real-recorded wind noise. We further show that the proposed
method generalizes well by testing on an unseen dataset with real-recorded wind
noise. Audio samples, data generation scripts and code for the proposed methods
can be found online (https://uhh.de/inf-sp-storm-wind).
( 2
min )
We present a new fab-in-the-loop reinforcement learning algorithm for the
design of nano-photonic components that accounts for the imperfections present
in nanofabrication processes. As a demonstration of the potential of this
technique, we apply it to the design of photonic crystal grating couplers
fabricated on an air clad 220 nm silicon on insulator single etch platform.
This fab-in-the-loop algorithm improves the insertion loss from 8.8 to 3.24 dB.
The widest bandwidth designs produced using our fab-in-the-loop algorithm can
cover a 150 nm bandwidth with less than 10.2 dB of loss at their lowest point.
( 2
min )
Interactive segmentation is a crucial research area in medical image analysis
aiming to boost the efficiency of costly annotations by incorporating human
feedback. This feedback takes the form of clicks, scribbles, or masks and
allows for iterative refinement of the model output so as to efficiently guide
the system towards the desired behavior. In recent years, deep learning-based
approaches have propelled results to a new level causing a rapid growth in the
field with 121 methods proposed in the medical imaging domain alone. In this
review, we provide a structured overview of this emerging field featuring a
comprehensive taxonomy, a systematic review of existing methods, and an
in-depth analysis of current practices. Based on these contributions, we
discuss the challenges and opportunities in the field. For instance, we find
that there is a severe lack of comparison across methods which needs to be
tackled by standardized baselines and benchmarks.
( 3
min )
The increasing use of Advanced Language Models (ALMs) in diverse sectors,
particularly due to their impressive capability to generate top-tier content
following linguistic instructions, forms the core of this investigation. This
study probes into ALMs' deployment in electronic hardware design, with a
specific emphasis on the synthesis and enhancement of Verilog programming. We
introduce an innovative framework, crafted to assess and amplify ALMs'
productivity in this niche. The methodology commences with the initial crafting
of Verilog programming via ALMs, succeeded by a distinct dual-stage refinement
protocol. The premier stage prioritizes augmenting the code's operational and
linguistic precision, while the latter stage is dedicated to aligning the code
with Power-Performance-Area (PPA) benchmarks, a pivotal component in proficient
hardware design. This bifurcated strategy, merging error remediation with PPA
enhancement, has yielded substantial upgrades in the caliber of ALM-created
Verilog programming. Our framework achieves an 81.37% rate in linguistic
accuracy and 62.0% in operational efficacy in programming synthesis, surpassing
current leading-edge techniques, such as 73% in linguistic accuracy and 46% in
operational efficacy. These findings illuminate ALMs' aptitude in tackling
complex technical domains and signal a positive shift in the mechanization of
hardware design operations.
( 3
min )
Machine learning, particularly graph learning, is gaining increasing
recognition for its transformative impact across various fields. One such
promising application is in the realm of molecule design and discovery, notably
within the pharmaceutical industry. Our survey offers a comprehensive overview
of state-of-the-art methods in molecule design, particularly focusing on
\emph{de novo} drug design, which incorporates (deep) graph learning
techniques. We categorize these methods into three distinct groups: \emph{i)}
\emph{all-at-once}, \emph{ii)} \emph{fragment-based}, and \emph{iii)}
\emph{node-by-node}. Additionally, we introduce some key public datasets and
outline the commonly used evaluation metrics for both the generation and
optimization of molecules. In the end, we discuss the existing challenges in
this field and suggest potential directions for future research.
( 2
min )
In this paper, we present a novel training approach called the Homotopy
Relaxation Training Algorithm (HRTA), aimed at accelerating the training
process in contrast to traditional methods. Our algorithm incorporates two key
mechanisms: one involves building a homotopy activation function that
seamlessly connects the linear activation function with the ReLU activation
function; the other technique entails relaxing the homotopy parameter to
enhance the training refinement process. We have conducted an in-depth analysis
of this novel method within the context of the neural tangent kernel (NTK),
revealing significantly improved convergence rates. Our experimental results,
especially when considering networks with larger widths, validate the
theoretical conclusions. This proposed HRTA exhibits the potential for other
activation functions and deep neural networks.
( 2
min )
The conservation of hydrological resources involves continuously monitoring
their contamination. A multi-agent system composed of autonomous surface
vehicles is proposed in this paper to efficiently monitor the water quality. To
achieve a safe control of the fleet, the fleet policy should be able to act
based on measurements and to the the fleet state. It is proposed to use Local
Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective
monitoring policies. Local Gaussian processes, unlike classical global Gaussian
processes, can accurately model the information in a dissimilar spatial
correlation which captures more accurately the water quality information. A
Deep convolutional policy is proposed, that bases the decisions on the
observation on the mean and variance of this model, by means of an information
gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to
minimize the estimation error in a safe manner thanks to a Consensus-based
heuristic. Simulation results indicate an improvement of up to 24% in terms of
the mean absolute error with the proposed models. Also, training results with
1-3 agents indicate that our proposed approach returns 20% and 24% smaller
average estimation errors for, respectively, monitoring water quality variables
and monitoring algae blooms, as compared to state-of-the-art approaches
( 2
min )
Federated Learning (FL) is a promising distributed learning mechanism which
still faces two major challenges, namely privacy breaches and system
efficiency. In this work, we reconceptualize the FL system from the perspective
of network information theory, and formulate an original FL communication
framework, FedNC, which is inspired by Network Coding (NC). The main idea of
FedNC is mixing the information of the local models by making random linear
combinations of the original parameters, before uploading for further
aggregation. Due to the benefits of the coding scheme, both theoretical and
experimental analysis indicate that FedNC improves the performance of
traditional FL in several important ways, including security, efficiency, and
robustness. To the best of our knowledge, this is the first framework where NC
is introduced in FL. As FL continues to evolve within practical network
frameworks, more variants can be further designed based on FedNC.
( 2
min )
The problem of high-quality drought forecasting up to a year in advance is
critical for agriculture planning and insurance. Yet, it is still unsolved with
reasonable accuracy due to data complexity and aridity stochasticity. We tackle
drought data by introducing an end-to-end approach that adopts a
spatio-temporal neural network model with accessible open monthly climate data
as the input.
Our systematic research employs diverse proposed models and five distinct
environmental regions as a testbed to evaluate the efficacy of the Palmer
Drought Severity Index (PDSI) prediction. Key aggregated findings are the
exceptional performance of a Transformer model, EarthFormer, in making accurate
short-term (up to six months) forecasts. At the same time, the Convolutional
LSTM excels in longer-term forecasting. Both models achieved high ROC AUC
scores: 0.948 for one month ahead and 0.617 for twelve months ahead forecasts.
( 2
min )
As the most basic application and implementation of deep learning, image
classification has grown in popularity. Various datasets are provided by
renowned data science communities for benchmarking machine learning algorithms
and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is
being used in this research for its overall acceptance and benchmark standards.
A comparison of various pre-trained models is demonstrated by using different
types of optimizers and loss functions. Hyper-parameters are changed to gain
the best result from a model. By applying this approach, we have got higher
accuracy without major changes in the training model. To run the experiment, we
used three different computer architectures: a laptop equipped with NVIDIA
GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a
desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate
supremacy in terms of accuracy over the previously done experiments on this
dataset. From this experiment, the highest accuracy which is 99.65% is gained
using the NASNet Large.
( 2
min )
The three classes of architectures for time series prediction were tested.
They differ by input layers which contain either convolutional, LSTM, or dense
hypercomplex layers for 4D algebras. The input was four related Stock Market
time series, and the prediction of one of them is expected. The optimization of
hyperparameters related to the classes of architectures was performed in order
to compare the best neural networks within the class. The results show that in
most cases, the architecture with a hypercomplex dense layer provides similar
MAE accuracy to other architectures, however, with considerably less trainable
parameters. Thanks to it, hypercomplex neural networks can be learned and
process data faster than the other tested architectures. Moreover, the order of
the input time series has an impact on effectively.
( 2
min )
This study develops a graph search algorithm to find the optimal
discrimination path for the binary classification problem. The objective
function is defined as the difference of variations between the true positive
(TP) and false positive (FP). It uses the depth first search (DFS) algorithm to
find the top-down paths for discrimination. It proposes a dynamic optimization
procedure to optimize TP at the upper levels and then reduce FP at the lower
levels. To accelerate computing speed with improving accuracy, it proposes a
reduced histogram algorithm with variable bin size instead of looping over all
data points, to find the feature threshold of discrimination. The algorithm is
applied on top of a Support Vector Machine (SVM) model for a binary
classification problem on whether a person is fit or unfit. It significantly
improves TP and reduces FP of the SVM results (e.g., reduced FP by 90% with a
loss of only\ 5% TP). The graph search auto-generates 39 ranked discrimination
paths within 9 seconds on an input of total 328,464 objects, using a dual-core
Laptop computer with a processor of 2.59 GHz.
( 2
min )
With the rapid increase in the number of Anthropogenic Space Objects (ASOs),
Low Earth Orbit (LEO) is facing significant congestion, thereby posing
challenges to space operators and risking the viability of the space
environment for varied uses. Current models for examining this evolution, while
detailed, are computationally demanding. To address these issues, we propose a
novel machine learning-based model, as an extension of the MIT Orbital Capacity
Tool (MOCAT). This advanced model is designed to accelerate the propagation of
ASO density distributions, and it is trained on hundreds of simulations
generated by an established and accurate model of the space environment
evolution. We study how different deep learning-based solutions can potentially
be good candidates for ASO propagation and manage the high-dimensionality of
the data. To assess the model's capabilities, we conduct experiments in long
term forecasting scenarios (around 100 years), analyze how and why the
performance degrades over time, and discuss potential solutions to make this
solution better.
( 2
min )
We use Koopman theory for data-driven model reduction of nonlinear dynamical
systems with controls. We propose generic model structures combining
delay-coordinate encoding of measurements and full-state decoding to integrate
reduced Koopman modeling and state estimation. We present a deep-learning
approach to train the proposed models. A case study demonstrates that our
approach provides accurate control models and enables real-time capable
nonlinear model predictive control of a high-purity cryogenic distillation
column.
( 2
min )
In this paper, we present results on improving out-of-domain weather
prediction and uncertainty estimation as part of the \texttt{Shifts Challenge
on Robustness and Uncertainty under Real-World Distributional Shift} challenge.
We find that by leveraging a mixture of experts in conjunction with an advanced
data augmentation technique borrowed from the computer vision domain, in
conjunction with robust \textit{post-hoc} calibration of predictive
uncertainties, we can potentially achieve more accurate and better-calibrated
results with deep neural networks than with boosted tree models for tabular
data. We quantify our predictions using several metrics and propose several
future lines of inquiry and experimentation to boost performance.
( 2
min )
GANStrument, exploiting GANs with a pitch-invariant feature extractor and
instance conditioning technique, has shown remarkable capabilities in
synthesizing realistic instrument sounds. To further improve the reconstruction
ability and pitch accuracy to enhance the editability of user-provided sound,
we propose HyperGANStrument, which introduces a pitch-invariant hypernetwork to
modulate the weights of a pre-trained GANStrument generator, given a one-shot
sound as input. The hypernetwork modulation provides feedback for the generator
in the reconstruction of the input sound. In addition, we take advantage of an
adversarial fine-tuning scheme for the hypernetwork to improve the
reconstruction fidelity and generation diversity of the generator. Experimental
results show that the proposed model not only enhances the generation
capability of GANStrument but also significantly improves the editability of
synthesized sounds. Audio examples are available at the online demo page.
( 2
min )
Federated Learning (FL) has become an established technique to facilitate
privacy-preserving collaborative training. However, new approaches to FL often
discuss their contributions involving small deep-learning models only. With the
tremendous success of transformer models, the following question arises: What
is necessary to operationalize foundation models in an FL application? Knowing
that computation and communication often take up similar amounts of time in FL,
we introduce a novel taxonomy focused on computational and communication
efficiency methods in FL applications. This said, these methods aim to optimize
the training time and reduce communication between clients and the server. We
also look at the current state of widely used FL frameworks and discuss future
research potentials based on existing approaches in FL research and beyond.
( 2
min )
The success of drug discovery and development relies on the precise
prediction of molecular activities and properties. While in silico molecular
property prediction has shown remarkable potential, its use has been limited so
far to assays for which large amounts of data are available. In this study, we
use a fine-tuned large language model to integrate biological assays based on
their textual information, coupled with Barlow Twins, a Siamese neural network
using a novel self-supervised learning approach. This architecture uses both
assay information and molecular fingerprints to extract the true molecular
information. TwinBooster enables the prediction of properties of unseen
bioassays and molecules by providing state-of-the-art zero-shot learning tasks.
Remarkably, our artificial intelligence pipeline shows excellent performance on
the FS-Mol benchmark. This breakthrough demonstrates the application of deep
learning to critical property prediction tasks where data is typically scarce.
By accelerating the early identification of active molecules in drug discovery
and development, this method has the potential to help streamline the
identification of novel therapeutics.
( 2
min )
In this paper, we first extend the result of FL93 and prove universal
consistency for a classification rule based on wide and deep ReLU neural
networks trained on the logistic loss. Unlike the approach in FL93 that
decomposes the estimation and empirical error, we directly analyze the
classification risk based on the observation that a realization of a neural
network that is wide enough is capable of interpolating an arbitrary number of
points. Secondly, we give sufficient conditions for a class of probability
measures under which classifiers based on neural networks achieve minimax
optimal rates of convergence. Our result is motivated from the practitioner's
observation that neural networks are often trained to achieve 0 training error,
which is the case for our proposed neural network classifiers. Our proofs hinge
on recent developments in empirical risk minimization and on approximation
rates of deep ReLU neural networks for various function classes of interest.
Applications to classical function spaces of smoothness illustrate the
usefulness of our result.
( 2
min )
Deep reinforcement learning (DRL) methods have recently shown promise in path
planning tasks. However, when dealing with global planning tasks, these methods
face serious challenges such as poor convergence and generalization. To this
end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan
Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems
from the perspective of DRL's observation, revealing that the traditional
design causes DRL to be interfered by irrelevant map information. Secondly, we
develop the LOPA which utilizes a novel attention-enhanced mechanism to attain
an improved attention capability towards the key information of the
observation. Such a mechanism is realized by two steps: (1) an attention model
is built to transform the DRL's observation into two dynamic views: local and
global, significantly guiding the LOPA to focus on the key information on the
given maps; (2) a dual-channel network is constructed to process these two
views and integrate them to attain an improved reasoning capability. The LOPA
is validated via multi-objective global path planning experiments. The result
suggests the LOPA has improved convergence and generalization performance as
well as great path planning efficiency.
( 2
min )
This chapter provides a comprehensive overview of the pragmatic aspects
involved in organizing AI competitions. We begin by discussing strategies to
incentivize participation, touching upon effective communication techniques,
aligning with trending topics in the field, structuring awards, potential
recruitment opportunities, and more. We then shift to the essence of community
engagement, and into organizational best practices and effective means of
disseminating challenge outputs. Lastly, the chapter addresses the logistics,
exposing on costs, required manpower, and resource allocation for effectively
managing and executing a challenge. By examining these practical problems,
readers will gain actionable insights to navigate the multifaceted landscape of
AI competition organization, from inception to completion.
( 2
min )
We propose a methodology, based on machine learning and optimization, for
selecting a solver configuration for a given instance. First, we employ a set
of solved instances and configurations in order to learn a performance function
of the solver. Secondly, we formulate a mixed-integer nonlinear program where
the objective/constraints explicitly encode the learnt information, and which
we solve, upon the arrival of an unknown instance, to find the best solver
configuration for that instance, based on the performance function. The main
novelty of our approach lies in the fact that the configuration set search
problem is formulated as a mathematical program, which allows us to a) enforce
hard dependence and compatibility constraints on the configurations, and b)
solve it efficiently with off-the-shelf optimization tools.
( 2
min )
We study the approximation capacity of some variation spaces corresponding to
shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth
functions are contained in these spaces with finite variation norms. For
functions with less smoothness, the approximation rates in terms of the
variation norm are established. Using these results, we are able to prove the
optimal approximation rates in terms of the number of neurons for shallow
ReLU$^k$ neural networks. It is also shown how these results can be used to
derive approximation bounds for deep neural networks and convolutional neural
networks (CNNs). As applications, we study convergence rates for nonparametric
regression using three ReLU neural network models: shallow neural network,
over-parameterized neural network, and CNN. In particular, we show that shallow
neural networks can achieve the minimax optimal rates for learning H\"older
functions, which complements recent results for deep neural networks. It is
also proven that over-parameterized (deep or shallow) neural networks can
achieve nearly optimal rates for nonparametric regression.
( 2
min )
Graph Neural Networks (GNNs) are able to achieve high classification accuracy
on many important real world datasets, but provide no rigorous notion of
predictive uncertainty. Quantifying the confidence of GNN models is difficult
due to the dependence between datapoints induced by the graph structure. We
leverage recent advances in conformal prediction to construct prediction sets
for node classification in inductive learning scenarios. We do this by taking
an existing approach for conformal classification that relies on
\textit{exchangeable} data and modifying it by appropriately weighting the
conformal scores to reflect the network structure. We show through experiments
on standard benchmark datasets using popular GNN models that our approach
provides tighter and better calibrated prediction sets than a naive application
of conformal prediction.
( 2
min )
The constantly increasing capabilities of artificial intelligence (AI) open
new possibilities for human-AI collaboration. One promising approach to
leverage existing complementary capabilities is allowing humans to delegate
individual instances to the AI. However, enabling humans to delegate instances
effectively requires them to assess both their own and the AI's capabilities in
the context of the given task. In this work, we explore the effects of
providing contextual information on human decisions to delegate instances to an
AI. We find that providing participants with contextual information
significantly improves the human-AI team performance. Additionally, we show
that the delegation behavior changes significantly when participants receive
varying types of contextual information. Overall, this research advances the
understanding of human-AI interaction in human delegation and provides
actionable insights for designing more effective collaborative systems.
( 2
min )
In this article, we consider convergence of stochastic gradient descent
schemes (SGD), including momentum stochastic gradient descent (MSGD), under
weak assumptions on the underlying landscape. More explicitly, we show that on
the event that the SGD stays bounded we have convergence of the SGD if there is
only a countable number of critical points or if the objective function
satisfies Lojasiewicz-inequalities around all critical levels as all analytic
functions do. In particular, we show that for neural networks with analytic
activation function such as softplus, sigmoid and the hyperbolic tangent, SGD
converges on the event of staying bounded, if the random variables modelling
the signal and response in the training are compactly supported.
( 2
min )
In this paper we propose to quantify execution time variability of programs
using statistical dispersion parameters. We show how the execution time
variability can be exploited in mixed criticality real-time systems. We propose
a heuristic to compute the execution time budget to be allocated to each low
criticality real-time task according to its execution time variability. We show
using experiments and simulations that the proposed heuristic reduces the
probability of exceeding the allocated budget compared to algorithms which do
not take into account the execution time variability parameter.
( 2
min )
We study the approximation capacity of some variation spaces corresponding to
shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth
functions are contained in these spaces with finite variation norms. For
functions with less smoothness, the approximation rates in terms of the
variation norm are established. Using these results, we are able to prove the
optimal approximation rates in terms of the number of neurons for shallow
ReLU$^k$ neural networks. It is also shown how these results can be used to
derive approximation bounds for deep neural networks and convolutional neural
networks (CNNs). As applications, we study convergence rates for nonparametric
regression using three ReLU neural network models: shallow neural network,
over-parameterized neural network, and CNN. In particular, we show that shallow
neural networks can achieve the minimax optimal rates for learning H\"older
functions, which complements recent results for deep neural networks. It is
also proven that over-parameterized (deep or shallow) neural networks can
achieve nearly optimal rates for nonparametric regression.
( 2
min )
Graph Neural Networks (GNNs) are able to achieve high classification accuracy
on many important real world datasets, but provide no rigorous notion of
predictive uncertainty. Quantifying the confidence of GNN models is difficult
due to the dependence between datapoints induced by the graph structure. We
leverage recent advances in conformal prediction to construct prediction sets
for node classification in inductive learning scenarios. We do this by taking
an existing approach for conformal classification that relies on
\textit{exchangeable} data and modifying it by appropriately weighting the
conformal scores to reflect the network structure. We show through experiments
on standard benchmark datasets using popular GNN models that our approach
provides tighter and better calibrated prediction sets than a naive application
of conformal prediction.
( 2
min )
In this paper, we first extend the result of FL93 and prove universal
consistency for a classification rule based on wide and deep ReLU neural
networks trained on the logistic loss. Unlike the approach in FL93 that
decomposes the estimation and empirical error, we directly analyze the
classification risk based on the observation that a realization of a neural
network that is wide enough is capable of interpolating an arbitrary number of
points. Secondly, we give sufficient conditions for a class of probability
measures under which classifiers based on neural networks achieve minimax
optimal rates of convergence. Our result is motivated from the practitioner's
observation that neural networks are often trained to achieve 0 training error,
which is the case for our proposed neural network classifiers. Our proofs hinge
on recent developments in empirical risk minimization and on approximation
rates of deep ReLU neural networks for various function classes of interest.
Applications to classical function spaces of smoothness illustrate the
usefulness of our result.
( 2
min )
This post is co-written with Jayadeep Pabbisetty, Sr. Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. Model developers often work together in developing ML models and require a robust […]
( 8
min )
Editor’s note: All papers referenced here represent collaborations throughout Microsoft and across academia and industry that include authors who contribute to Aether, the Microsoft internal advisory body for AI ethics and effects in engineering and research. A surge of generative AI models in the past year has fueled much discussion about the impact of artificial […]
The post Advancing transparency: Updates on responsible AI research appeared first on Microsoft Research.
( 18
min )
NVIDIA continues to be among America’s very best places to work as judged by employees themselves, rising to second place on Glassdoor’s list of best employers for 2024. This is the fourth consecutive year NVIDIA has been among the top five on the closely watched list, which is based on anonymous employee reviews about their Read article >
( 5
min )
Memory constraint of always-on devices is one of the major concerns when
deploying speech processing models on these devices. While larger models
trained with sufficiently large amount of data generally perform better, making
them fit in the device memory is a demanding challenge. In this paper, we aim
to reduce model size by reparameterizing model weights across Transformer
encoder layers and assuming a special weight composition and structure. More
specifically, inspired by ResNet and the more recent LoRA work, we propose an
approach named ResidualTransformer, where each weight matrix in a Transformer
layer comprises 1) a shared full-rank component with its adjacent layers, and
2) a unique low-rank component to itself. The low-rank matrices only account
for a small amount of model size increase. In addition, we add diagonal weight
matrices to improve modeling capacity of the low-rank matrices. Experiments of
our 10k-hour speech recognition and speech translation tasks show that the
Transformer encoder size can be reduced by ~3X with very slight performance
degradation.
( 2
min )
The in-context learning ability of large language models (LLMs) enables them
to generalize to novel downstream tasks with relatively few labeled examples.
However, they require enormous computational resources to be deployed.
Alternatively, smaller models can solve specific tasks if fine-tuned with
enough labeled examples. These examples, however, are expensive to obtain. In
pursuit of the best of both worlds, we study synthetic data generation of
fine-tuning training data via fine-tuned teacher LLMs to improve the downstream
performance of much smaller models. In four text classification and two text
generation tasks, we find that both data generation and annotation dramatically
improve the respective downstream model's performance, occasionally
necessitating only a minor fraction of the original training dataset.
( 2
min )
Personalized recommendations form an important part of today's internet
ecosystem, helping artists and creators to reach interested users, and helping
users to discover new and engaging content. However, many users today are
skeptical of platforms that personalize recommendations, in part due to
historically careless treatment of personal data and data privacy. Now,
businesses that rely on personalized recommendations are entering a new
paradigm, where many of their systems must be overhauled to be privacy-first.
In this article, we propose an algorithm for personalized recommendations that
facilitates both precise and differentially-private measurement. We consider
advertising as an example application, and conduct offline experiments to
quantify how the proposed privacy-preserving algorithm affects key metrics
related to user experience, advertiser value, and platform revenue compared to
the extremes of both (private) non-personalized and non-private, personalized
implementations.
( 2
min )
While a practical wireless network has many tiers where end users do not
directly communicate with the central server, the users' devices have limited
computation and battery powers, and the serving base station (BS) has a fixed
bandwidth. Owing to these practical constraints and system models, this paper
leverages model pruning and proposes a pruning-enabled hierarchical federated
learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper
bound of the convergence rate that clearly demonstrates the impact of the model
pruning and wireless communications between the clients and the associated BS.
Then we jointly optimize the model pruning ratio, central processing unit (CPU)
frequency and transmission power of the clients in order to minimize the
controllable terms of the convergence bound under strict delay and energy
constraints. However, since the original problem is not convex, we perform
successive convex approximation (SCA) and jointly optimize the parameters for
the relaxed convex problem. Through extensive simulation, we validate the
effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall
clock time, energy consumption and bandwidth requirement.
( 2
min )
We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor
detecting hate speech in online social networks such as Reddit discussions. In
contrast to traditional comment-only methods, our approach to labelling a
comment as hate speech involves a holistic analysis of text and images grounded
in the discussion context. This is done by leveraging graph transformers to
capture the contextual relationships in the discussion surrounding a comment
and grounding the interwoven fusion layers that combine text and image
embeddings instead of processing modalities separately. To evaluate our work,
we present a new dataset, HatefulDiscussions, comprising complete multi-modal
discussions from multiple online communities on Reddit. We compare the
performance of our model to baselines that only process individual comments and
conduct extensive ablation studies.
( 2
min )
The separate tasks of denoising, least squares expectation, and manifold
learning can often be posed in a common setting of finding the conditional
expectations arising from a product of two random variables. This paper focuses
on this more general problem and describes an operator theoretic approach to
estimating the conditional expectation. Kernel integral operators are used as a
compactification tool, to set up the estimation problem as a linear inverse
problem in a reproducing kernel Hilbert space. This equation is shown to have
solutions that allow numerical approximation, thus guaranteeing the convergence
of data-driven implementations. The overall technique is easy to implement, and
their successful application to some real-world problems are also shown.
( 2
min )
Directly predicting human epidermal growth factor receptor 2 (HER2) status
from widely available hematoxylin and eosin (HE)-stained whole slide images
(WSIs) can reduce technical costs and expedite treatment selection. Accurately
predicting HER2 requires large collections of multi-site WSIs. Federated
learning enables collaborative training of these WSIs without gigabyte-size
WSIs transportation and data privacy concerns. However, federated learning
encounters challenges in addressing label imbalance in multi-site WSIs from the
real world. Moreover, existing WSI classification methods cannot simultaneously
exploit local context information and long-range dependencies in the site-end
feature representation of federated learning. To address these issues, we
present a point transformer with federated learning for multi-site HER2 status
prediction from HE-stained WSIs. Our approach incorporates two novel designs.
We propose a dynamic label distribution strategy and an auxiliary classifier,
which helps to establish a well-initialized model and mitigate label
distribution variations across sites. Additionally, we propose a farthest
cosine sampling based on cosine distance. It can sample the most distinctive
features and capture the long-range dependencies. Extensive experiments and
analysis show that our method achieves state-of-the-art performance at four
sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can
generalize to two unseen sites with 229 WSIs.
( 3
min )
We implement a Bayesian inference process for Neural Networks to model the
time to failure of highly reliable weapon systems with interval-censored data
and time-varying covariates. We analyze and benchmark our approach, LaplaceNN,
on synthetic and real datasets with standard classification metrics such as
Receiver Operating Characteristic (ROC) Area Under Curve (AUC) Precision-Recall
(PR) AUC, and reliability curve visualizations.
( 2
min )
This research introduces a sophisticated transfer learning model based on
Google's MobileNetV2 for breast cancer tumor classification into normal,
benign, and malignant categories, utilizing a dataset of 1576 ultrasound images
(265 normal, 891 benign, 420 malignant). The model achieves an accuracy of
0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and
MCC of 0.74. It examines image intensity distributions and misclassification
errors, offering improvements for future applications. Addressing dataset
imbalances, the study ensures a generalizable model. This work, using a dataset
from Baheya Hospital, Cairo, Egypt, compiled by Walid Al-Dhabyani et al.,
emphasizes MobileNetV2's potential in medical imaging, aiming to improve
diagnostic precision in oncology. Additionally, the paper explores
Streamlit-based deployment for real-time tumor classification, demonstrating
MobileNetV2's applicability in medical imaging and setting a benchmark for
future research in oncology diagnostics.
( 2
min )
Tiger conservation necessitates the strategic deployment of multifaceted
initiatives encompassing the preservation of ecological habitats, anti-poaching
measures, and community involvement for sustainable growth in the tiger
population. With the advent of artificial intelligence, tiger surveillance can
be automated using object detection. In this paper, an accurate illumination
invariant framework is proposed based on EnlightenGAN and YOLOv8 for tiger
detection. The fine-tuned YOLOv8 model achieves a mAP score of 61% without
illumination enhancement. The illumination enhancement improves the mAP by
0.7%. The approaches elevate the state-of-the-art performance on the ATRW
dataset by approximately 6% to 7%.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).B.1
( 2
min )
In this work, we propose an end-to-end adaptive sampling neural network
(MMPDE-Net) based on the moving mesh method, which can adaptively generate new
sampling points by solving the moving mesh PDE. This model focuses on improving
the quality of sampling points generation. Moreover, we develop an iterative
algorithm based on MMPDE-Net, which makes the sampling points more precise and
controllable. Since MMPDE-Net is a framework independent of the deep learning
solver, we combine it with physics-informed neural networks (PINN) to propose
moving sampling PINN (MS-PINN) and demonstrate its effectiveness by error
analysis under some assumptions. Finally, we demonstrate the performance
improvement of MS-PINN compared to PINN through numerical experiments of four
typical examples, which numerically verify the effectiveness of our method.
( 2
min )
We have formulated a family of machine learning problems as the time
evolution of Parametric Probabilistic Models (PPMs), inherently rendering a
thermodynamic process. Our primary motivation is to leverage the rich toolbox
of thermodynamics of information to assess the information-theoretic content of
learning a probabilistic model. We first introduce two information-theoretic
metrics: Memorized-information (M-info) and Learned-information (L-info), which
trace the flow of information during the learning process of PPMs. Then, we
demonstrate that the accumulation of L-info during the learning process is
associated with entropy production, and parameters serve as a heat reservoir in
this process, capturing learned information in the form of M-info.
( 2
min )
Recently, there has been a growing interest in learning and explaining causal
effects within Neural Network (NN) models. By virtue of NN architectures,
previous approaches consider only direct and total causal effects assuming
independence among input variables. We view an NN as a structural causal model
(SCM) and extend our focus to include indirect causal effects by introducing
feedforward connections among input neurons. We propose an ante-hoc method that
captures and maintains direct, indirect, and total causal effects during NN
model training. We also propose an algorithm for quantifying learned causal
effects in an NN model and efficient approximation strategies for quantifying
causal effects in high-dimensional data. Extensive experiments conducted on
synthetic and real-world datasets demonstrate that the causal effects learned
by our ante-hoc method better approximate the ground truth effects compared to
existing methods.
( 2
min )
In this paper, we provide a strategy to determine the eigenvalue decay rate
(EDR) of a large class of kernel functions defined on a general domain rather
than $\mathbb S^{d}$. This class of kernel functions include but are not
limited to the neural tangent kernel associated with neural networks with
different depths and various activation functions. After proving that the
dynamics of training the wide neural networks uniformly approximated that of
the neural tangent kernel regression on general domains, we can further
illustrate the minimax optimality of the wide neural network provided that the
underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an
interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of
NTK. We also showed that the overfitted neural network can not generalize well.
We believe our approach for determining the EDR of kernels might be also of
independent interests.
( 2
min )
Due to mutual interference between users, power allocation problems in
wireless networks are often non-convex and computationally challenging. Graph
neural networks (GNNs) have recently emerged as a promising approach to
tackling these problems and an approach that exploits the underlying topology
of wireless networks. In this paper, we propose a novel graph representation
method for wireless networks that include full-duplex (FD) nodes. We then
design a corresponding FD Graph Neural Network (F-GNN) with the aim of
allocating transmit powers to maximise the network throughput. Our results show
that our F-GNN achieves state-of-art performance with significantly less
computation time. Besides, F-GNN offers an excellent trade-off between
performance and complexity compared to classical approaches. We further refine
this trade-off by introducing a distance-based threshold for inclusion or
exclusion of edges in the network. We show that an appropriately chosen
threshold reduces required training time by roughly 20% with a relatively minor
loss in performance.
( 2
min )
Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives
and are gaining more attention in machine learning applications. Imposing such
quasimetric structures in model representations has been shown to improve many
tasks, including reinforcement learning (RL) and causal relation learning. In
this work, we present four desirable properties in such quasimetric models, and
show how prior works fail at them. We propose Interval Quasimetric Embedding
(IQE), which is designed to satisfy all four criteria. On three quasimetric
learning experiments, IQEs show strong approximation and generalization
abilities, leading to better performance and improved efficiency over prior
methods.
Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding
Quasimetric Learning Code Package:
https://www.github.com/quasimetric-learning/torch-quasimetric
( 2
min )
While recent research advances in speaker diarization mostly focus on
improving the quality of diarization results, there is also an increasing
interest in improving the efficiency of diarization systems. In this paper, we
demonstrate that a multi-stage clustering strategy that uses different
clustering algorithms for input of different lengths can address multi-faceted
challenges of on-device speaker diarization applications. Specifically, a
fallback clusterer is used to handle short-form inputs; a main clusterer is
used to handle medium-length inputs; and a pre-clusterer is used to compress
long-form inputs before they are processed by the main clusterer. Both the main
clusterer and the pre-clusterer can be configured with an upper bound of the
computational complexity to adapt to devices with different resource
constraints. This multi-stage clustering strategy is critical for streaming
on-device speaker diarization systems, where the budgets of CPU, memory and
battery are tight.
( 2
min )
The stochastic block model is a canonical random graph model for clustering
and community detection on network-structured data. Decades of extensive study
on the problem have established many profound results, among which the phase
transition at the Kesten-Stigum threshold is particularly interesting both from
a mathematical and an applied standpoint. It states that no estimator based on
the network topology can perform substantially better than chance on sparse
graphs if the model parameter is below certain threshold. Nevertheless, if we
slightly extend the horizon to the ubiquitous semi-supervised setting, such a
fundamental limitation will disappear completely. We prove that with arbitrary
fraction of the labels revealed, the detection problem is feasible throughout
the parameter domain. Moreover, we introduce two efficient algorithms, one
combinatorial and one based on optimization, to integrate label information
with graph structures. Our work brings a new perspective to stochastic model of
networks and semidefinite program research.
( 2
min )
Astronomical observations typically provide three-dimensional maps, encoding
the distribution of the observed flux in (1) the two angles of the celestial
sphere and (2) energy/frequency. An important task regarding such maps is to
statistically characterize populations of point sources too dim to be
individually detected. As the properties of a single dim source will be poorly
constrained, instead one commonly studies the population as a whole, inferring
a source-count distribution (SCD) that describes the number density of sources
as a function of their brightness. Statistical and machine learning methods for
recovering SCDs exist; however, they typically entirely neglect spectral
information associated with the energy distribution of the flux. We present a
deep learning framework able to jointly reconstruct the spectra of different
emission components and the SCD of point-source populations. In a
proof-of-concept example, we show that our method accurately extracts even
complex-shaped spectra and SCDs from simulated maps.
( 2
min )
The field of eXplainable Artificial Intelligence (XAI) aims to bring
transparency to today's powerful but opaque deep learning models. While local
XAI methods explain individual predictions in form of attribution maps, thereby
identifying where important features occur (but not providing information about
what they represent), global explanation techniques visualize what concepts a
model has generally learned to encode. Both types of methods thus only provide
partial insights and leave the burden of interpreting the model's reasoning to
the user. In this work we introduce the Concept Relevance Propagation (CRP)
approach, which combines the local and global perspectives and thus allows
answering both the "where" and "what" questions for individual predictions. We
demonstrate the capability of our method in various settings, showcasing that
CRP leads to more human interpretable explanations and provides deep insights
into the model's representation and reasoning through concept atlases, concept
composition analyses, and quantitative investigations of concept subspaces and
their role in fine-grained decision making.
( 2
min )
Split Learning (SL) is a promising Distributed Learning approach in
electromyography (EMG) based prosthetic control, due to its applicability
within resource-constrained environments. Other learning approaches, such as
Deep Learning and Federated Learning (FL), provide suboptimal solutions, since
prosthetic devices are extremely limited in terms of processing power and
battery life. The viability of implementing SL in such scenarios is caused by
its inherent model partitioning, with clients executing the smaller model
segment. However, selecting an inadequate cut layer hinders the training
process in SL systems. This paper presents an algorithm for optimal cut layer
selection in terms of maximizing the convergence rate of the model. The
performance evaluation demonstrates that the proposed algorithm substantially
accelerates the convergence in an EMG pattern recognition task for improving
prosthetic device control.
( 2
min )
The accurate identification of walnuts within orchards brings forth a
plethora of advantages, profoundly amplifying the efficiency and productivity
of walnut orchard management. Nevertheless, the unique characteristics of
walnut trees, characterized by their closely resembling shapes, colors, and
textures between the walnuts and leaves, present a formidable challenge in
precisely distinguishing between them during the annotation process. In this
study, we present a novel approach to improve walnut detection efficiency,
utilizing YOLOv5 trained on an enriched image set that incorporates both real
and synthetic RGB and NIR images. Our analysis comparing results from our
original and augmented datasets shows clear improvements in detection when
using the synthetic images.
( 2
min )
Many adversarial attacks target natural language processing systems, most of
which succeed through modifying the individual tokens of a document. Despite
the apparent uniqueness of each of these attacks, fundamentally they are simply
a distinct configuration of four components: a goal function, allowable
transformations, a search method, and constraints. In this survey, we
systematically present the different components used throughout the literature,
using an attack-independent framework which allows for easy comparison and
categorisation of components. Our work aims to serve as a comprehensive guide
for newcomers to the field and to spark targeted research into refining the
individual attack components.
( 2
min )
Due to their unsupervised training and uncertainty estimation, deep
Variational Autoencoders (VAEs) have become powerful tools for
reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based
TSAD methods, either statistical or deep, tune meta-priors to estimate the
likelihood probability for effectively capturing spatiotemporal dependencies in
the data. However, these methods confront the challenge of inherent data
scarcity, which is often the case in anomaly detection tasks. Such scarcity
easily leads to latent holes, discontinuous regions in latent space, resulting
in non-robust reconstructions on these discontinuous spaces. We propose a novel
generative framework that combines VAEs with self-supervised learning (SSL) to
address this issue.
( 2
min )
Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due
to its ability to support heterogeneous models and data. To reduce the high
communication cost of transmitting model parameters, a major challenge in HtFL,
prototype-based HtFL methods are proposed to solely share class
representatives, a.k.a, prototypes, among heterogeneous clients while
maintaining the privacy of clients' models. However, these prototypes are
naively aggregated into global prototypes on the server using weighted
averaging, resulting in suboptimal global knowledge which negatively impacts
the performance of clients. To overcome this challenge, we introduce a novel
HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced
Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the
server. By incorporating ACL, our approach enhances prototype separability
while preserving semantic meaning. Extensive experiments with twelve
heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art
methods by up to 9.08% in accuracy while maintaining the communication and
privacy advantages of prototype-based HtFL. Our code is available at
https://github.com/TsingZ0/FedTGP.
( 2
min )
We address the challenge of estimating the learning rate for adaptive
gradient methods used in training deep neural networks. While several
learning-rate-free approaches have been proposed, they are typically tailored
for steepest descent. However, although steepest descent methods offer an
intuitive approach to finding minima, many deep learning applications require
adaptive gradient methods to achieve faster convergence. In this paper, we
interpret adaptive gradient methods as steepest descent applied on
parameter-scaled networks, proposing learning-rate-free adaptive gradient
methods. Experimental results verify the effectiveness of this approach,
demonstrating comparable performance to hand-tuned learning rates across
various scenarios. This work extends the applicability of learning-rate-free
methods, enhancing training with adaptive gradient methods.
( 2
min )
This paper explores the application of CNN-DNN network fusion to construct a
robot navigation controller within a simulated environment. The simulated
environment is constructed to model a subterranean rescue situation, such that
an autonomous agent is tasked with finding a goal within an unknown cavernous
system. Imitation learning is used to train the control algorithm to use LiDAR
and camera data to navigate the space and find the goal. The trained model is
then tested for robustness using Monte-Carlo.
( 2
min )
Today, many users deploy their microservice-based applications with various
interconnections on a cluster of Cloud machines, subject to stochastic changes
due to dynamic user requirements. To address this problem, we compare three
machine learning (ML) models for predicting the microservice call rates based
on the microservice times and aiming at estimating the scalability
requirements. We apply the linear regression (LR), multilayer perception (MLP),
and gradient boosting regression (GBR) models on the Alibaba microservice
traces. The prediction results reveal that the LR model reaches a lower
training time than the GBR and MLP models. However, the GBR reduces the mean
absolute error and the mean absolute percentage error compared to LR and MLP
models. Moreover, the prediction results show that the required number of
replicas for each microservice by the gradient boosting model is close to the
actual test data without any prediction.
( 2
min )
Federated learning is an emerging distributed machine learning framework in
the Internet of Vehicles (IoV). In IoV, millions of vehicles are willing to
train the model to share their knowledge. Maintaining an active state means the
participants must update their state to the FL server in a fixed interval and
participate to next round. However, the cost by maintaining an active state is
very large when there are a huge number of participating vehicles. In this
paper, we proposed a distributed client selection scheme to reduce the cost of
maintaining the active state for all participants. The clients with the highest
evaluation are elected among the neighbours. In the evaluator, four variables
are considered including sample quantity, throughput available, computational
capability and the quality of the local dataset. We adopted fuzzy logic as the
evaluator since the closed-form solution over four variables does not exist.
Extensive simulation results show our proposal approximates the centralized
client selection in terms of accuracy and can significantly reduce the
communication overhead.
( 2
min )
To address the limitations of traffic prediction from location-bound
detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data
source that leverages the extensive coverage of cellular traffic to capture
mobility patterns. Our extensive analysis validates its potential for
transportation. Focusing on vehicle-related GCT flow prediction, we propose a
graph neural network that integrates multivariate, temporal, and spatial facets
for improved accuracy. Experiments reveal our model's superiority over
baselines, especially in long-term predictions. We also highlight the potential
for GCT flow integration into transportation systems.
( 2
min )
In this work, we study the convergence of Hermitian Dynamic Mode
Decomposition (DMD) to the spectral properties of self-adjoint Koopman
operators. Hermitian DMD is a data-driven method for approximating the Koopman
operator associated with an unknown nonlinear dynamical system from
discrete-time snapshots, while preserving the self-adjointness of the operator
on its finite-dimensional approximations. We show that, under suitable
conditions, the eigenvalues and eigenfunctions of HDMD converge to the spectral
properties of the underlying Koopman operator. Along the way, we establish a
general theorem on the convergence of spectral measures, and demonstrate our
results numerically on the two-dimensional Schr\"odinger equation.
( 2
min )
A genuine signer's signature is naturally unstable even at short
time-intervals whereas, expert forgers always try to perfectly mimic a genuine
signer's signature. This presents a challenge which puts a genuine signer at
risk of being denied access, while a forge signer is granted access. The
implication is a high false acceptance rate (FAR) which is the percentage of
forge signature classified as belonging to a genuine class. Existing work have
only scratched the surface of signature verification because the
misclassification error remains high. In this paper, a consensus-threshold
distance-based classifier criterion is proposed for offline writer-dependent
signature verification. Using features extracted from SigNet and SigNet-F deep
convolutional neural network models, the proposed classifier minimizes FAR.
This is demonstrated via experiments on four datasets: GPDS-300, MCYT, CEDAR
and Brazilian PUC-PR datasets. On GPDS-300, the consensus threshold classifier
improves the state-of-the-art performance by achieving a 1.27% FAR compared to
8.73% and 17.31% recorded in literature. This performance is consistent across
other datasets and guarantees that the risk of imposters gaining access to
sensitive documents or transactions is minimal.
( 2
min )
One common approach to solve multi-objective reinforcement learning (MORL)
problems is to extend conventional Q-learning by using vector Q-values in
combination with a utility function. However issues can arise with this
approach in the context of stochastic environments, particularly when
optimising for the Scalarised Expected Reward (SER) criterion. This paper
extends prior research, providing a detailed examination of the factors
influencing the frequency with which value-based MORL Q-learning algorithms
learn the SER-optimal policy for an environment with stochastic state
transitions. We empirically examine several variations of the core
multi-objective Q-learning algorithm as well as reward engineering approaches,
and demonstrate the limitations of these methods. In particular, we highlight
the critical impact of the noisy Q-value estimates issue on the stability and
convergence of these algorithms.
( 2
min )
Providing high-quality video with efficient bitrate is a main challenge in
video industry. The traditional one-size-fits-all scheme for bitrate ladders is
inefficient and reaching the best content-aware decision computationally
impractical due to extensive encodings required. To mitigate this, we propose a
bitrate and complexity efficient bitrate ladder prediction method using
transfer learning and spatio-temporal features. We propose: (1) using feature
maps from well-known pre-trained DNNs to predict rate-quality behavior with
limited training data; and (2) improving highest quality rung efficiency by
predicting minimum bitrate for top quality and using it for the top rung. The
method tested on 102 video scenes demonstrates 94.1% reduction in complexity
versus brute-force at 1.71% BD-Rate expense. Additionally, transfer learning
was thoroughly studied through four networks and ablation studies.
( 2
min )
Distributed Denial of Service (DDoS) attacks pose a significant threat to the
stability and reliability of online systems. Effective and early detection of
such attacks is pivotal for safeguarding the integrity of networks. In this
work, we introduce an enhanced approach for DDoS attack detection by leveraging
the capabilities of Deep Residual Neural Networks (ResNets) coupled with
synthetic oversampling techniques. Because of the inherent class imbalance in
many cyber-security datasets, conventional methods often struggle with false
negatives, misclassifying subtle DDoS patterns as benign. By applying the
Synthetic Minority Over-sampling Technique (SMOTE) to the CICIDS dataset, we
balance the representation of benign and malicious data points, enabling the
model to better discern intricate patterns indicative of an attack. Our deep
residual network, tailored for this specific task, further refines the
detection process. Experimental results on a real-world dataset demonstrate
that our approach achieves an accuracy of 99.98%, significantly outperforming
traditional methods. This work underscores the potential of combining advanced
data augmentation techniques with deep learning models to bolster
cyber-security defenses.
( 2
min )
Alzheimer's is a brain disease that gets worse over time and affects memory,
thinking, and behavior. Alzheimer's disease (AD) can be treated and managed if
it is diagnosed early, which can slow the progression of symptoms and improve
quality of life. In this study, we suggested using the Visual Transformer (ViT)
and bi-LSTM to process MRI images for diagnosing Alzheimer's disease. We used
ViT to extract features from the MRI and then map them to a feature sequence.
Then, we used Bi-LSTM sequence modeling to keep the interdependencies between
related features. In addition, we evaluated the performance of the proposed
model for the binary classification of AD patients using data from the
Alzheimer's Disease Neuroimaging Initiative (ADNI). Finally, we evaluated our
method against other deep learning models in the literature. The proposed
method performs well in terms of accuracy, precision, F-score, and recall for
the diagnosis of AD.
( 2
min )
Calibration is an essential key in machine leaning. Semi Unsupervised
Calibration through Prior Adaptation (SUCPA) is a calibration algorithm used in
(but not limited to) large-scale language models defined by a {system of
first-order difference equation. The map derived by this system} has the
peculiarity of being non-hyperbolic {with a non-bounded set of non-isolated
fixed points}. In this work, we prove several convergence properties of this
algorithm from the perspective of dynamical systems. For a binary
classification problem, it can be shown that the algorithm always converges,
{more precisely, the map is globally asymptotically stable, and the orbits
converge} to a single line of fixed points. Finally, we perform numerical
experiments on real-world application to support the presented results.
Experiment codes are available online.
( 2
min )
This work aims at improving the energy efficiency of decentralized learning
by optimizing the mixing matrix, which controls the communication demands
during the learning process. Through rigorous analysis based on a
state-of-the-art decentralized learning algorithm, the problem is formulated as
a bi-level optimization, with the lower level solved by graph sparsification. A
solution with guaranteed performance is proposed for the special case of
fully-connected base topology and a greedy heuristic is proposed for the
general case. Simulations based on real topology and dataset show that the
proposed solution can lower the energy consumption at the busiest node by
54%-76% while maintaining the quality of the trained model.
( 2
min )
Recently, Transformer-base models have made significant progress in the field
of time series prediction which have achieved good results and become baseline
models beyond Dlinear. The paper proposes an U-Net time series prediction model
(UnetTSF) with linear complexity, which adopts the U-Net architecture. We are
the first to use FPN technology to extract features from time series data,
replacing the method of decomposing time series data into trend and seasonal
terms, while designing a fusion structure suitable for time series data. After
testing on 8 open-source datasets, compared to the best linear model DLiner.
Out of 32 testing projects, 31 achieved the best results. The average decrease
in mse is 10.1%, while the average decrease in mae is 9.1%. Compared with the
complex transformer-base PatchTST, UnetTSF obtained 9 optimal results for mse
and 15 optimal results for mae in 32 testing projects.
( 2
min )
This paper presents an innovative approach to address the challenges of
translating multi-modal emotion recognition models to a more practical and
resource-efficient uni-modal counterpart, specifically focusing on speech-only
emotion recognition. Recognizing emotions from speech signals is a critical
task with applications in human-computer interaction, affective computing, and
mental health assessment. However, existing state-of-the-art models often rely
on multi-modal inputs, incorporating information from multiple sources such as
facial expressions and gestures, which may not be readily available or feasible
in real-world scenarios. To tackle this issue, we propose a novel framework
that leverages knowledge distillation and masked training techniques.
( 2
min )
Metasurfaces have widespread applications in fifth-generation (5G) microwave
communication. Among the metasurface family, free-form metasurfaces excel in
achieving intricate spectral responses compared to regular-shape counterparts.
However, conventional numerical methods for free-form metasurfaces are
time-consuming and demand specialized expertise. Alternatively, recent studies
demonstrate that deep learning has great potential to accelerate and refine
metasurface designs. Here, we present XGAN, an extended generative adversarial
network (GAN) with a surrogate for high-quality free-form metasurface designs.
The proposed surrogate provides a physical constraint to XGAN so that XGAN can
accurately generate metasurfaces monolithically from input spectral responses.
In comparative experiments involving 20000 free-form metasurface designs, XGAN
achieves 0.9734 average accuracy and is 500 times faster than the conventional
methodology. This method facilitates the metasurface library building for
specific spectral responses and can be extended to various inverse design
problems, including optical metamaterials, nanophotonic devices, and drug
discovery.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).B.1
( 2
min )
The separate tasks of denoising, least squares expectation, and manifold
learning can often be posed in a common setting of finding the conditional
expectations arising from a product of two random variables. This paper focuses
on this more general problem and describes an operator theoretic approach to
estimating the conditional expectation. Kernel integral operators are used as a
compactification tool, to set up the estimation problem as a linear inverse
problem in a reproducing kernel Hilbert space. This equation is shown to have
solutions that allow numerical approximation, thus guaranteeing the convergence
of data-driven implementations. The overall technique is easy to implement, and
their successful application to some real-world problems are also shown.
( 2
min )
In this paper, we provide a strategy to determine the eigenvalue decay rate
(EDR) of a large class of kernel functions defined on a general domain rather
than $\mathbb S^{d}$. This class of kernel functions include but are not
limited to the neural tangent kernel associated with neural networks with
different depths and various activation functions. After proving that the
dynamics of training the wide neural networks uniformly approximated that of
the neural tangent kernel regression on general domains, we can further
illustrate the minimax optimality of the wide neural network provided that the
underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an
interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of
NTK. We also showed that the overfitted neural network can not generalize well.
We believe our approach for determining the EDR of kernels might be also of
independent interests.
( 2
min )
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in
Stata. Estimators of causal parameters for five different econometric models
are supported, allowing for flexible estimation of causal effects of endogenous
variables in settings with unknown functional forms and/or many exogenous
variables. ddml is compatible with many existing supervised machine learning
programs in Stata. We recommend using DDML in combination with stacking
estimation which combines multiple machine learners into a final predictor. We
provide Monte Carlo evidence to support our recommendation.
( 2
min )
We introduce two data-driven procedures for optimal estimation and inference
in nonparametric models using instrumental variables. The first is a
data-driven choice of sieve dimension for a popular class of sieve two-stage
least squares estimators. When implemented with this choice, estimators of both
the structural function $h_0$ and its derivatives (such as elasticities)
converge at the fastest possible (i.e., minimax) rates in sup-norm. The second
is for constructing uniform confidence bands (UCBs) for $h_0$ and its
derivatives. Our UCBs guarantee coverage over a generic class of
data-generating processes and contract at the minimax rate, possibly up to a
logarithmic factor. As such, our UCBs are asymptotically more efficient than
UCBs based on the usual approach of undersmoothing. As an application, we
estimate the elasticity of the intensive margin of firm exports in a
monopolistic competition model of international trade. Simulations illustrate
the good performance of our procedures in empirically calibrated designs. Our
results provide evidence against common parameterizations of the distribution
of unobserved firm heterogeneity.
( 2
min )
The stochastic block model is a canonical random graph model for clustering
and community detection on network-structured data. Decades of extensive study
on the problem have established many profound results, among which the phase
transition at the Kesten-Stigum threshold is particularly interesting both from
a mathematical and an applied standpoint. It states that no estimator based on
the network topology can perform substantially better than chance on sparse
graphs if the model parameter is below certain threshold. Nevertheless, if we
slightly extend the horizon to the ubiquitous semi-supervised setting, such a
fundamental limitation will disappear completely. We prove that with arbitrary
fraction of the labels revealed, the detection problem is feasible throughout
the parameter domain. Moreover, we introduce two efficient algorithms, one
combinatorial and one based on optimization, to integrate label information
with graph structures. Our work brings a new perspective to stochastic model of
networks and semidefinite program research.
( 2
min )
Characterizing the distribution of high-dimensional statistical estimators is
a challenging task, due to the breakdown of classical asymptotic theory in high
dimension. This paper makes progress towards this by developing non-asymptotic
distributional characterizations for approximate message passing (AMP) -- a
family of iterative algorithms that prove effective as both fast estimators and
powerful theoretical machinery -- for both sparse and robust regression. Prior
AMP theory, which focused on high-dimensional asymptotics for the most part,
failed to describe the behavior of AMP when the number of iterations exceeds
$o\big({\log n}/{\log \log n}\big)$ (with $n$ the sample size). We establish
the first finite-sample non-asymptotic distributional theory of AMP for both
sparse and robust regression that accommodates a polynomial number of
iterations. Our results derive approximate accuracy of Gaussian approximation
of the AMP iterates, which improves upon all prior results and implies enhanced
distributional characterizations for both optimally tuned Lasso and robust
M-estimator.
( 2
min )
In this manuscript, we propose an efficient manifold denoiser based on
landmark diffusion and optimal shrinkage under the complicated high dimensional
noise and compact manifold setup. It is flexible to handle several setups,
including the high ambient space dimension with a manifold embedding that
occupies a subspace of high or low dimensions, and the noise could be colored
and dependent. A systematic comparison with other existing algorithms on both
simulated and real datasets is provided. This manuscript is mainly algorithmic
and we report several existing tools and numerical results. Theoretical
guarantees and more comparisons will be reported in the official paper of this
manuscript.
( 2
min )
Due to their unsupervised training and uncertainty estimation, deep
Variational Autoencoders (VAEs) have become powerful tools for
reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based
TSAD methods, either statistical or deep, tune meta-priors to estimate the
likelihood probability for effectively capturing spatiotemporal dependencies in
the data. However, these methods confront the challenge of inherent data
scarcity, which is often the case in anomaly detection tasks. Such scarcity
easily leads to latent holes, discontinuous regions in latent space, resulting
in non-robust reconstructions on these discontinuous spaces. We propose a novel
generative framework that combines VAEs with self-supervised learning (SSL) to
address this issue.
( 2
min )
With the rapid adoption of generative AI applications, there is a need for these applications to respond in time to reduce the perceived latency with higher throughput. Foundation models (FMs) are often pre-trained on vast corpora of data with parameters ranging in scale of millions to billions and beyond. Large language models (LLMs) are a […]
( 15
min )
In this post, we walk you through the process to deploy Amazon Q in your AWS account and add it to your Slack workspace. When you’re done, you’ll wonder how you ever managed without it!
( 8
min )
Ninety-eight percent of retailers plan to invest in generative AI in the next 18 months, according to a new survey conducted by NVIDIA. That makes retail one of the industries racing fastest to adopt generative AI to ramp up productivity, transform customer experiences and improve efficiency. Early deployments in the retail industry include personalized shopping Read article >
( 6
min )
The retail industry is in the midst of a major technology transformation, fueled by the rise in AI. With the highest potential for AI and analytics among all industries, the retail and consumer packaged goods (CPG) sectors are poised to harness the power of AI to enhance operational efficiency, elevate customer and employee experiences and Read article >
( 6
min )
NVIDIA and the Loss Prevention Research Council (LPRC) are collaborating with several AI companies to showcase a real-time solution for combating and preventing organized retail crime (ORC). The integrated offering provides advance notifications of suspicious behavior inside and outside stores so that authorities can intervene early. The LPRC includes asset-protection executives from more than 85 Read article >
( 6
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This study examines the impact of class-imbalanced data on deep learning
models and proposes a technique for data balancing by generating synthetic data
for the minority class. Unlike random-based oversampling, our method
prioritizes balancing the informative regions by identifying high entropy
samples. Generating well-placed synthetic data can enhance machine learning
algorithms accuracy and efficiency, whereas poorly-placed ones may lead to
higher misclassification rates. We introduce an algorithm that maximizes the
probability of generating a synthetic sample in the correct region of its class
by optimizing the class posterior ratio. Additionally, to maintain data
topology, synthetic data are generated within each minority sample's
neighborhood. Our experimental results on forty-one datasets demonstrate the
superior performance of our technique in enhancing deep-learning models.
( 2
min )
Weather station data is a valuable resource for climate prediction, however,
its reliability can be limited in remote locations. To compound the issue,
making local predictions often relies on sensor data that may not be accessible
for a new, previously unmonitored location. In response to these challenges, we
propose a novel zero-shot learning approach designed to forecast various
climate measurements at new and unmonitored locations. Our method surpasses
conventional weather forecasting techniques in predicting microclimate
variables by leveraging knowledge extracted from other geographic locations.
( 2
min )
Current state-of-the-art analyses on the convergence of gradient descent for
training neural networks focus on characterizing properties of the loss
landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted
strong convexity. While gradient descent converges linearly under such
conditions, it remains an open question whether Nesterov's momentum enjoys
accelerated convergence under similar settings and assumptions. In this work,
we consider a new class of objective functions, where only a subset of the
parameters satisfies strong convexity, and show Nesterov's momentum achieves
acceleration in theory for this objective class. We provide two realizations of
the problem class, one of which is deep ReLU networks, which --to the best of
our knowledge--constitutes this work the first that proves accelerated
convergence rate for non-trivial neural network architectures.
( 2
min )
Discovering human cognitive and emotional states using multi-modal
physiological signals draws attention across various research applications.
Physiological responses of the human body are influenced by human cognition and
commonly used to analyze cognitive states. From a network science perspective,
the interactions of these heterogeneous physiological modalities in a graph
structure may provide insightful information to support prediction of cognitive
states. However, there is no clue to derive exact connectivity between
heterogeneous modalities and there exists a hierarchical structure of
sub-modalities. Existing graph neural networks are designed to learn on
non-hierarchical homogeneous graphs with pre-defined graph structures; they
failed to learn from hierarchical, multi-modal physiological data without a
pre-defined graph structure. To this end, we propose a hierarchical
heterogeneous graph generative network (H2G2-Net) that automatically learns a
graph structure without domain knowledge, as well as a powerful representation
on the hierarchical heterogeneous graph in an end-to-end fashion. We validate
the proposed method on the CogPilot dataset that consists of multi-modal
physiological signals. Extensive experiments demonstrate that our proposed
method outperforms the state-of-the-art GNNs by 5%-20% in prediction accuracy.
( 2
min )
Optical lithography is the main enabler to semiconductor manufacturing. It
requires extensive processing to perform the Resolution Enhancement Techniques
(RETs) required to transfer the design data to a working Integrated Circuits
(ICs). The processing power and computational runtime for RETs tasks is ever
increasing due to the continuous reduction of the feature size and the
expansion of the chip area. State-of-the-art research sought Machine Learning
(ML) technologies to reduce runtime and computational power, however they are
still not used in production yet. In this study, we analyze the reasons holding
back ML computational lithography from being production ready and present a
novel highly scalable end-to-end flow that enables production ready ML-RET
correction.
( 2
min )
Understanding and identifying musical shape plays an important role in music
education and performance assessment. To simplify the otherwise time- and
cost-intensive musical shape evaluation, in this paper we explore how
artificial intelligence (AI) driven models can be applied. Considering musical
shape evaluation as a classification problem, a light-weight Siamese residual
neural network (S-ResNN) is proposed to automatically identify musical shapes.
To assess the proposed approach in the context of piano musical shape
evaluation, we have generated a new dataset, containing 4116 music pieces
derived by 147 piano preparatory exercises and performed in 28 categories of
musical shapes. The experimental results show that the S-ResNN significantly
outperforms a number of benchmark methods in terms of the precision, recall and
F1 score.
( 2
min )
Geometric Sensitive Hashing functions, a family of Local Sensitive Hashing
functions, are neural network models that learn class-specific manifold
geometry in supervised learning. However, given a set of supervised learning
tasks, understanding the manifold geometries that can represent each task and
the kinds of relationships between the tasks based on them has received little
attention. We explore a formalization of this question by considering a
generative process where each task is associated with a high-dimensional
manifold, which can be done in brain-like models with neuromodulatory systems.
Following this formulation, we define \emph{Task-specific Geometric Sensitive
Hashing~(T-GSH)} and show that a randomly weighted neural network with a
neuromodulation system can realize this function.
( 2
min )
Sequential optimization methods are often confronted with the curse of
dimensionality in high-dimensional spaces. Current approaches under the
Gaussian process framework are still burdened by the computational complexity
of tracking Gaussian process posteriors and need to partition the optimization
problem into small regions to ensure exploration or assume an underlying
low-dimensional structure. With the idea of transiting the candidate points
towards more promising positions, we propose a new method based on Markov Chain
Monte Carlo to efficiently sample from an approximated posterior. We provide
theoretical guarantees of its convergence in the Gaussian process Thompson
sampling setting. We also show experimentally that both the Metropolis-Hastings
and the Langevin Dynamics version of our algorithm outperform state-of-the-art
methods in high-dimensional sequential optimization and reinforcement learning
benchmarks.
( 2
min )
Predicting the solubility of given molecules remains crucial in the
pharmaceutical industry. In this study, we revisited this extensively studied
topic, leveraging the capabilities of contemporary computing resources. We
employed two machine learning models: a linear regression model and a graph
convolutional neural network (GCNN) model, using various experimental datasets.
Both methods yielded reasonable predictions, with the GCNN model exhibiting the
highest level of performance. However, the present GCNN model has limited
interpretability while the linear regression model allows scientists for a
greater in-depth analysis of the underlying factors through feature importance
analysis, although more human inputs and evaluations on the overall dataset is
required. From the perspective of chemistry, using the linear regression model,
we elucidated the impact of individual atom species and functional groups on
overall solubility, highlighting the significance of comprehending how chemical
structure influences chemical properties in the drug development process. It is
learned that introducing oxygen atoms can increase the solubility of organic
molecules, while almost all other hetero atoms except oxygen and nitrogen tend
to decrease solubility.
( 3
min )
In this paper, we introduce FITS, a lightweight yet powerful model for time
series analysis. Unlike existing models that directly process raw time-domain
data, FITS operates on the principle that time series can be manipulated
through interpolation in the complex frequency domain. By discarding
high-frequency components with negligible impact on time series data, FITS
achieves performance comparable to state-of-the-art models for time series
forecasting and anomaly detection tasks, while having a remarkably compact size
of only approximately $10k$ parameters. Such a lightweight model can be easily
trained and deployed in edge devices, creating opportunities for various
applications. The code is available in: \url{https://github.com/VEWOXIC/FITS}
( 2
min )
Quantum machine learning with quantum kernels for classification problems is
a growing area of research. Recently, quantum kernel alignment techniques that
parameterise the kernel have been developed, allowing the kernel to be trained
and therefore aligned with a specific dataset. While quantum kernel alignment
is a promising technique, it has been hampered by considerable training costs
because the full kernel matrix must be constructed at every training iteration.
Addressing this challenge, we introduce a novel method that seeks to balance
efficiency and performance. We present a sub-sampling training approach that
uses a subset of the kernel matrix at each training step, thereby reducing the
overall computational cost of the training. In this work, we apply the
sub-sampling method to synthetic datasets and a real-world breast cancer
dataset and demonstrate considerable reductions in the number of circuits
required to train the quantum kernel while maintaining classification accuracy.
( 2
min )
Sampling-based model predictive control (MPC) has found significant success
in optimal control problems with non-smooth system dynamics and cost function.
Many machine learning-based works proposed to improve MPC by a) learning or
fine-tuning the dynamics/ cost function, or b) learning to optimize for the
update of the MPC controllers. For the latter, imitation learning-based
optimizers are trained to update the MPC controller by mimicking the expert
demonstrations, which, however, are expensive or even unavailable. More
significantly, many sequential decision-making problems are in non-stationary
environments, requiring that an optimizer should be adaptable and generalizable
to update the MPC controller for solving different tasks. To address those
issues, we propose to learn an optimizer based on meta-reinforcement learning
(RL) to update the controllers. This optimizer does not need expert
demonstration and can enable fast adaptation (e.g., few-shots) when it is
deployed in unseen control tasks. Experimental results validate the
effectiveness of the learned optimizer regarding fast adaptation.
( 2
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63% accuracy on some of the interview videos.
( 3
min )
With the continued introduction of driverless events to Formula:Society of
Automotive Engineers (F:SAE) competitions around the world, teams are
investigating all aspects of the autonomous vehicle stack. This paper presents
the use of Deep Reinforcement Learning (DRL) and Inverse Reinforcement Learning
(IRL) to map locally-observed cone positions to a desired steering angle for
race track following. Two state-of-the-art algorithms not previously tested in
this context: soft actor critic (SAC) and adversarial inverse reinforcement
learning (AIRL), are used to train models in a representative simulation. Three
novel reward functions for use by RL algorithms in an autonomous racing context
are also discussed. Tests performed in simulation and the real world suggest
that both algorithms can successfully train models for local path following.
Suggestions for future work are presented to allow these models to scale to a
full F:SAE vehicle.
( 2
min )
As the global population continues to expand, the demand for natural
resources increases. Unfortunately, human activities account for 23% of
greenhouse gas emissions. On a positive note, remote sensing technologies have
emerged as a valuable tool in managing our environment. These technologies
allow us to monitor land use, plan urban areas, and drive advancements in areas
such as agriculture, climate change mitigation, disaster recovery, and
environmental monitoring. Recent advances in AI, computer vision, and earth
observation data have enabled unprecedented accuracy in land use mapping. By
using transfer learning and fine-tuning with RGB bands, we achieved an
impressive 99.19% accuracy in land use analysis. Such findings can be used to
inform conservation and urban planning policies.
( 2
min )
We develop a distributed Block Chebyshev-Davidson algorithm to solve
large-scale leading eigenvalue problems for spectral analysis in spectral
clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on
the prior knowledge of the eigenvalue spectrum, which could be expensive to
estimate. This issue can be lessened by the analytic spectrum estimation of the
Laplacian or normalized Laplacian matrices in spectral clustering, making the
proposed algorithm very efficient for spectral clustering. Second, to make the
proposed algorithm capable of analyzing big data, a distributed and parallel
version has been developed with attractive scalability. The speedup by parallel
computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the
number of processes. {Numerical results will be provided to demonstrate its
efficiency in spectral clustering and scalability advantage over existing
eigensolvers used for spectral clustering in parallel computing environments.}
( 2
min )
This paper presents our recent initiatives to foster the discoverability of
new releases on the music streaming service Deezer. After introducing our
search and recommendation features dedicated to new releases, we outline our
shift from editorial to personalized release suggestions using cold start
embeddings and contextual bandits. Backed by online experiments, we discuss the
advantages of this shift in terms of recommendation quality and exposure of new
releases on the service.
( 2
min )
Deep generative replay has emerged as a promising approach for continual
learning in decision-making tasks. This approach addresses the problem of
catastrophic forgetting by leveraging the generation of trajectories from
previously encountered tasks to augment the current dataset. However, existing
deep generative replay methods for continual learning rely on autoregressive
models, which suffer from compounding errors in the generated trajectories. In
this paper, we propose a simple, scalable, and non-autoregressive method for
continual learning in decision-making tasks using a generative model that
generates task samples conditioned on the trajectory timestep. We evaluate our
method on Continual World benchmarks and find that our approach achieves
state-of-the-art performance on the average success rate metric among continual
learning methods. Code is available at https://github.com/WilliamYue37/t-DGR .
( 2
min )
Climate change poses increasingly complex challenges to our society. Extreme
weather events such as floods, wild fires or droughts are becoming more
frequent, spontaneous and difficult to foresee or counteract. In this work we
specifically address the problem of sewage water polluting surface water bodies
after spilling over from rain tanks as a consequence of heavy rain events. We
investigate to what extent state-of-the-art interpretable time series models
can help predict such critical water level points, so that the excess can
promptly be redistributed across the sewage network. Our results indicate that
modern time series models can contribute to better waste water management and
prevention of environmental pollution from sewer systems. All the code and
experiments can be found in our repository:
https://github.com/TeodorChiaburu/RIWWER_TimeSeries.
( 2
min )
Large Language Models (LLMs) have showcased impressive capabilities in
handling straightforward programming tasks. However, their performance tends to
falter when confronted with more challenging programming problems. We observe
that conventional models often generate solutions as monolithic code blocks,
restricting their effectiveness in tackling intricate questions. To overcome
this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a
pioneering framework for MoT instruction tuning, designed to promote the
decomposition of tasks into logical sub-tasks and sub-modules. Our
investigations reveal that, through the cultivation and utilization of
sub-modules, MoTCoder significantly improves both the modularity and
correctness of the generated solutions, leading to substantial relative pass@1
improvements of 12.9% on APPS and 9.43% on CodeContests. Our codes are
available at https://github.com/dvlab-research/MoTCoder.
( 2
min )
Physics-informed neural network (PINN) is a data-driven solver for partial
and ordinary differential equations(ODEs/PDEs). It provides a unified framework
to address both forward and inverse problems. However, the complexity of the
objective function often leads to training failures. This issue is particularly
prominent when solving high-frequency and multi-scale problems. We proposed
using transfer learning to boost the robustness and convergence of training
PINN, starting training from low-frequency problems and gradually approaching
high-frequency problems. Through two case studies, we discovered that transfer
learning can effectively train PINN to approximate solutions from low-frequency
problems to high-frequency problems without increasing network parameters.
Furthermore, it requires fewer data points and less training time. We
elaborately described our training strategy, including optimizer selection, and
suggested guidelines for using transfer learning to train neural networks for
solving more complex problems.
( 2
min )
We discuss causal inference for observational studies with possibly invalid
instrumental variables. We propose a novel methodology called two-stage
curvature identification (TSCI) by exploring the nonlinear treatment model with
machine learning. {The first-stage machine learning enables improving the
instrumental variable's strength and adjusting for different forms of violating
the instrumental variable assumptions.} The success of TSCI requires the
instrumental variable's effect on treatment to differ from its violation form.
A novel bias correction step is implemented to remove bias resulting from the
potentially high complexity of machine learning. Our proposed \texttt{TSCI}
estimator is shown to be asymptotically unbiased and Gaussian even if the
machine learning algorithm does not consistently estimate the treatment model.
Furthermore, we design a data-dependent method to choose the best among several
candidate violation forms. We apply TSCI to study the effect of education on
earnings.
( 2
min )
Sequential optimization methods are often confronted with the curse of
dimensionality in high-dimensional spaces. Current approaches under the
Gaussian process framework are still burdened by the computational complexity
of tracking Gaussian process posteriors and need to partition the optimization
problem into small regions to ensure exploration or assume an underlying
low-dimensional structure. With the idea of transiting the candidate points
towards more promising positions, we propose a new method based on Markov Chain
Monte Carlo to efficiently sample from an approximated posterior. We provide
theoretical guarantees of its convergence in the Gaussian process Thompson
sampling setting. We also show experimentally that both the Metropolis-Hastings
and the Langevin Dynamics version of our algorithm outperform state-of-the-art
methods in high-dimensional sequential optimization and reinforcement learning
benchmarks.
( 2
min )
A multimodal system uses models trained on language, vision, and action data to help robots develop and execute plans for household, construction, and manufacturing tasks.
( 10
min )
MIT researchers propose “PEDS” method for developing models of complex physical systems in mechanics, optics, thermal transport, fluid dynamics, physical chemistry, climate, and more.
( 8
min )
AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). However, they’re unable to gain insights such as using the information locked in the documents for large language models (LLMs) or search until they extract the text, forms, […]
( 10
min )
Generative AI is transforming drug research and development, enabling new discoveries faster than ever — and Amgen, one of the world’s leading biotechnology companies, is tapping the technology to power its research. Amgen will build AI models trained to analyze one of the world’s largest human datasets on an NVIDIA DGX SuperPOD, a full-stack data Read article >
( 6
min )
In perhaps the healthcare industry’s most dramatic transformation since the advent of computing, digital biology and generative AI are helping to reinvent drug discovery, surgery, medical imaging and wearable devices. NVIDIA has been preparing for this moment for over a decade, building deep domain expertise, creating the NVIDIA Clara healthcare-specific computing platform and expanding its Read article >
( 7
min )
The AI revolution returned to where it started this week, putting powerful new tools into the hands of gamers and content creators. Generative AI models that will bring lifelike characters to games and applications and new GPUs for gamers and creators were among the highlights of a news-packed address Monday ahead of this week’s CES Read article >
( 9
min )
Amid explosive interest in generative AI, the auto industry is racing to embrace the power of AI across a range of critical activities, from vehicle design, engineering and manufacturing, to marketing and sales. The adoption of generative AI — along with the growing importance of software-defined computing — will continue to transform the automotive market Read article >
( 6
min )
NVIDIA Studio is debuting at CES powerful new software and hardware upgrades to elevate content creation.
( 11
min )
Twitch, OBS and NVIDIA are leveling up livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available in a few days, streamers will be able to stream multiple encodes concurrently, providing optimal viewing experiences for all viewers.
( 5
min )
Getty Images, a global visual content creator and marketplace, today at CES released Generative AI by iStock, an affordable and commercially safe image generation service trained on the company’s creative library of licensed, proprietary data. Built on NVIDIA Picasso, a foundry for custom AI models, Generative AI by iStock provides designers and businesses with a Read article >
( 5
min )
Whether building a super-capable truck or conjuring up a dream sports car, spending hours playing with online car configurators is easy. With auto industry insiders predicting that most new vehicle purchases will move online by 2030, these configurators are more than just toys. They’re crucial to the future of the world’s automakers — essential in Read article >
( 6
min )
NVIDIA is bringing more games, membership options and innovative tech to its GeForce NOW cloud gaming service. The next Activision and Blizzard titles to join the cloud, Diablo IV and Overwatch 2, will be coming soon. They’ll be joined by a host of top titles, including Capcom’s Exoprimal, HoYoverse’s Honkai: Star Rail and Mainframe Industries’ Read article >
( 9
min )
Generative AI is reshaping trillion-dollar industries, and NVIDIA, a front-runner in smart robotics, is seizing the moment. Speaking today as part of a special address ahead of CES, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla detailed how NVIDIA and its partners are bringing generative AI and robotics together. It’s a natural fit, Read article >
( 6
min )
In our fast-changing, digitized world business strategies, and content planning are also moving into the world of numbers, minimizing the need for human work. Nowadays, artificial intelligence is developing day by day, expanding over more and more users and areas of use. Below you will learn about AI chatbots, their advantages and disadvantages. You will… Read More »Unleashing innovation: How AI chatbots transform your website strategy
The post Unleashing innovation: How AI chatbots transform your website strategy appeared first on Data Science Central.
( 23
min )
There is a new letter on TIME, What Generative AI Reveals About the Human Mind, where a professor wrote, “Natural brains must learn to predict those sensory flows in a very special kind of context—the context of using the sensory information to select actions that help us survive and thrive in our worlds. This means… Read More »Textual predictive coding: Do LLMs and the human mind compare?
The post Textual predictive coding: Do LLMs and the human mind compare? appeared first on Data Science Central.
( 20
min )
In the short-paced landscape of information-driven decision-making, actual-time analytics has come to be paramount for corporations seeking to benefit from insights at the rate of the enterprise. Database streaming offerings have emerged as a transformative answer, allowing the processing and analysis of facts in movement. This article explores the abilities of database streaming services and… Read More »Real-time analytics with database streaming services: Harnessing data velocity
The post Real-time analytics with database streaming services: Harnessing data velocity appeared first on Data Science Central.
( 21
min )
In Part 1 of the series “GenAI: Beware the Productivity Trap,” we discussed embracing an economic mindset to avoid falling into the productivity trap. We discussed some challenges with the productivity trap and then reviewed some data economic concepts that can take your organization to the next level of game-changing performance and innovation. In Part… Read More »GenAI: Beware the Productivity Trap; It’s About Nanoeconomics – Part 2
The post GenAI: Beware the Productivity Trap; It’s About Nanoeconomics – Part 2 appeared first on Data Science Central.
( 20
min )
Recent CNN and Transformer-based models tried to utilize frequency and
periodicity information for long-term time series forecasting. However, most
existing work is based on Fourier transform, which cannot capture fine-grained
and local frequency structure. In this paper, we propose a Wavelet-Fourier
Transform Network (WFTNet) for long-term time series forecasting. WFTNet
utilizes both Fourier and wavelet transforms to extract comprehensive
temporal-frequency information from the signal, where Fourier transform
captures the global periodic patterns and wavelet transform captures the local
ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to
adaptively balance the importance of global and local frequency patterns.
Extensive experiments on various time series datasets show that WFTNet
consistently outperforms other state-of-the-art baseline. Code is available at
https://github.com/Hank0626/WFTNet.
( 2
min )
A cost-effective alternative to manual data labeling is weak supervision
(WS), where data samples are automatically annotated using a predefined set of
labeling functions (LFs), rule-based mechanisms that generate artificial labels
for the associated classes. In this work, we investigate noise reduction
techniques for WS based on the principle of k-fold cross-validation. We
introduce a new algorithm ULF for Unsupervised Labeling Function correction,
which denoises WS data by leveraging models trained on all but some LFs to
identify and correct biases specific to the held-out LFs. Specifically, ULF
refines the allocation of LFs to classes by re-estimating this assignment on
highly reliable cross-validated samples. Evaluation on multiple datasets
confirms ULF's effectiveness in enhancing WS learning without the need for
manual labeling.
( 2
min )
We study the influence of different activation functions in the output layer
of deep neural network models for soft and hard label prediction in the
learning with disagreement task. In this task, the goal is to quantify the
amount of disagreement via predicting soft labels. To predict the soft labels,
we use BERT-based preprocessors and encoders and vary the activation function
used in the output layer, while keeping other parameters constant. The soft
labels are then used for the hard label prediction. The activation functions
considered are sigmoid as well as a step-function that is added to the model
post-training and a sinusoidal activation function, which is introduced for the
first time in this paper.
( 2
min )
Bayesian networks (BNs) are a foundational model in machine learning and
causal inference. Their graphical structure can handle high-dimensional
problems, divide them into a sparse collection of smaller ones, underlies Judea
Pearl's causality, and determines their explainability and interpretability.
Despite their popularity, there are almost no resources in the literature on
how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for
BNs under their most common distributional assumptions. In this paper, we
provide computationally efficient algorithms for both by leveraging BNs'
graphical structure, and we illustrate them with a complete set of numerical
examples. In the process, we show it is possible to reduce the computational
complexity of KL from cubic to quadratic for Gaussian BNs.
( 2
min )
Fake news detection models are critical to countering disinformation but can
be manipulated through adversarial attacks. In this position paper, we analyze
how an attacker can compromise the performance of an online learning detector
on specific news content without being able to manipulate the original target
news. In some contexts, such as social networks, where the attacker cannot
exert complete control over all the information, this scenario can indeed be
quite plausible. Therefore, we show how an attacker could potentially introduce
poisoning data into the training data to manipulate the behavior of an online
learning method. Our initial findings reveal varying susceptibility of logistic
regression models based on complexity and attack type.
( 2
min )
We present a deep learning model to automatically generate computer models of
the human heart from patient imaging data with an emphasis on its capability to
generate thin-walled cardiac structures. Our method works by deforming a
template mesh to fit the cardiac structures to the given image. Compared with
prior deep learning methods that adopted this approach, our framework is
designed to minimize mesh self-penetration, which typically arises when
deforming surface meshes separated by small distances. We achieve this by using
a two-stage diffeomorphic deformation process along with a novel loss function
derived from the kinematics of motion that penalizes surface contact and
interpenetration. Our model demonstrates comparable accuracy with
state-of-the-art methods while additionally producing meshes free of
self-intersections. The resultant meshes are readily usable in physics based
simulation, minimizing the need for post-processing and cleanup.
( 2
min )
Large language models have made significant strides in natural language
processing, enabling innovative applications in molecular science by processing
textual representations of molecules. However, most existing language models
cannot capture the rich information with complex molecular structures or
images. In this paper, we introduce GIT-Mol, a multi-modal large language model
that integrates the Graph, Image, and Text information. To facilitate the
integration of multi-modal molecular data, we propose GIT-Former, a novel
architecture that is capable of aligning all modalities into a unified latent
space. We achieve a 5%-10% accuracy increase in properties prediction and a
20.2% boost in molecule generation validity compared to the baselines. With the
any-to-language molecular translation strategy, our model has the potential to
perform more downstream tasks, such as compound name recognition and chemical
reaction prediction.
( 2
min )
The conflict between stiffness and toughness is a fundamental problem in
engineering materials design. However, the systematic discovery of
microstructured composites with optimal stiffness-toughness trade-offs has
never been demonstrated, hindered by the discrepancies between simulation and
reality and the lack of data-efficient exploration of the entire Pareto front.
We introduce a generalizable pipeline that integrates physical experiments,
numerical simulations, and artificial neural networks to address both
challenges. Without any prescribed expert knowledge of material design, our
approach implements a nested-loop proposal-validation workflow to bridge the
simulation-to-reality gap and discover microstructured composites that are
stiff and tough with high sample efficiency. Further analysis of Pareto-optimal
designs allows us to automatically identify existing toughness enhancement
mechanisms, which were previously discovered through trial-and-error or
biomimicry. On a broader scale, our method provides a blueprint for
computational design in various research areas beyond solid mechanics, such as
polymer chemistry, fluid dynamics, meteorology, and robotics.
( 2
min )
Kernel Stein discrepancies (KSDs) measure the quality of a distributional
approximation and can be computed even when the target density has an
intractable normalizing constant. Notable applications include the diagnosis of
approximate MCMC samplers and goodness-of-fit tests for unnormalized
statistical models. The present work analyzes the convergence control
properties of KSDs. We first show that standard KSDs used for weak convergence
control fail to control moment convergence. To address this limitation, we next
provide sufficient conditions under which alternative diffusion KSDs control
both moment and weak convergence. As an immediate consequence we develop, for
each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein
convergence.
( 2
min )
This paper concerns the training of a single-layer morphological perceptron
using disciplined convex-concave programming (DCCP). We introduce an algorithm
referred to as K-DDCCP, which combines the existing single-layer morphological
perceptron (SLMP) model proposed by Ritter and Urcid with the weighted
disciplined convex-concave programming (WDCCP) algorithm by Charisopoulos and
Maragos. The proposed training algorithm leverages the disciplined
convex-concave procedure (DCCP) and formulates a non-convex optimization
problem for binary classification. To tackle this problem, the constraints are
expressed as differences of convex functions, enabling the application of the
DCCP package. The experimental results confirm the effectiveness of the K-DDCCP
algorithm in solving binary classification problems. Overall, this work
contributes to the field of morphological neural networks by proposing an
algorithm that extends the capabilities of the SLMP model.
( 2
min )
Although deep learning-based algorithms have demonstrated excellent
performance in automated emotion recognition via electroencephalogram (EEG)
signals, variations across brain signal patterns of individuals can diminish
the model's effectiveness when applied across different subjects. While
transfer learning techniques have exhibited promising outcomes, they still
encounter challenges related to inadequate feature representations and may
overlook the fact that source subjects themselves can possess distinct
characteristics. In this work, we propose a multi-source domain adaptation
approach with a transformer-based feature generator (MSDA-TF) designed to
leverage information from multiple sources. The proposed feature generator
retains convolutional layers to capture shallow spatial, temporal, and spectral
EEG data representations, while self-attention mechanisms extract global
dependencies within these features. During the adaptation process, we group the
source subjects based on correlation values and aim to align the moments of the
target subject with each source as well as within the sources. MSDA-TF is
validated on the SEED dataset and is shown to yield promising results.
( 2
min )
Distributional Reinforcement Learning (RL) estimates return distribution
mainly by learning quantile values via minimizing the quantile Huber loss
function, entailing a threshold parameter often selected heuristically or via
hyperparameter search, which may not generalize well and can be suboptimal.
This paper introduces a generalized quantile Huber loss function derived from
Wasserstein distance (WD) calculation between Gaussian distributions, capturing
noise in predicted (current) and target (Bellman-updated) quantile values.
Compared to the classical quantile Huber loss, this innovative loss function
enhances robustness against outliers. Notably, the classical Huber loss
function can be seen as an approximation of our proposed loss, enabling
parameter adjustment by approximating the amount of noise in the data during
the learning process. Empirical tests on Atari games, a common application in
distributional RL, and a recent hedging strategy using distributional RL,
validate the effectiveness of our proposed loss function and its potential for
parameter adjustments in distributional RL.
( 2
min )
A critical factor in trustworthy machine learning is to develop robust
representations of the training data. Only under this guarantee methods are
legitimate to artificially generate data, for example, to counteract imbalanced
datasets or provide counterfactual explanations for blackbox decision-making
systems. In recent years, Generative Adversarial Networks (GANs) have shown
considerable results in forming stable representations and generating realistic
data. While many applications focus on generating image data, less effort has
been made in generating time series data, especially multivariate signals. In
this work, a Transformer-based autoencoder is proposed that is regularized
using an adversarial training scheme to generate artificial multivariate time
series signals. The representation is evaluated using t-SNE visualizations,
Dynamic Time Warping (DTW) and Entropy scores. Our results indicate that the
generated signals exhibit higher similarity to an exemplary dataset than using
a convolutional network approach.
( 2
min )
Pipeline parallelism is an essential technique in the training of large-scale
Transformer models. However, it suffers from imbalanced memory consumption,
leading to insufficient memory utilization. The BPipe technique was proposed to
address this issue and has proven effective in the GPT-3 model. Nevertheless,
our experiments have not yielded similar benefits for LLaMA training.
Additionally, BPipe only yields negligible benefits for GPT-3 training when
applying flash attention. We analyze the underlying causes of the divergent
performance of BPipe on GPT-3 and LLaMA. Furthermore, we introduce a novel
method to estimate the performance of BPipe.
( 2
min )
The universal approximation theorem states that a neural network with one
hidden layer can approximate continuous functions on compact sets with any
desired precision. This theorem supports using neural networks for various
applications, including regression and classification tasks. Furthermore, it is
valid for real-valued neural networks and some hypercomplex-valued neural
networks such as complex-, quaternion-, tessarine-, and Clifford-valued neural
networks. However, hypercomplex-valued neural networks are a type of
vector-valued neural network defined on an algebra with additional algebraic or
geometric properties. This paper extends the universal approximation theorem
for a wide range of vector-valued neural networks, including
hypercomplex-valued models as particular instances. Precisely, we introduce the
concept of non-degenerate algebra and state the universal approximation theorem
for neural networks defined on such algebras.
( 2
min )
We introduce a novel sampler called the energy based diffusion generator for
generating samples from arbitrary target distributions. The sampling model
employs a structure similar to a variational autoencoder, utilizing a decoder
to transform latent variables from a simple distribution into random variables
approximating the target distribution, and we design an encoder based on the
diffusion model. Leveraging the powerful modeling capacity of the diffusion
model for complex distributions, we can obtain an accurate variational estimate
of the Kullback-Leibler divergence between the distributions of the generated
samples and the target. Moreover, we propose a decoder based on generalized
Hamiltonian dynamics to further enhance sampling performance. Through empirical
evaluation, we demonstrate the effectiveness of our method across various
complex distribution functions, showcasing its superiority compared to existing
methods.
( 2
min )
This project explores adversarial training techniques to develop fairer Deep
Neural Networks (DNNs) to mitigate the inherent bias they are known to exhibit.
DNNs are susceptible to inheriting bias with respect to sensitive attributes
such as race and gender, which can lead to life-altering outcomes (e.g.,
demographic bias in facial recognition software used to arrest a suspect). We
propose a robust optimization problem, which we demonstrate can improve
fairness in several datasets, both synthetic and real-world, using an affine
linear model. Leveraging second order information, we are able to find a
solution to our optimization problem more efficiently than a purely first order
method.
( 2
min )
Bayesian networks (BNs) are a foundational model in machine learning and
causal inference. Their graphical structure can handle high-dimensional
problems, divide them into a sparse collection of smaller ones, underlies Judea
Pearl's causality, and determines their explainability and interpretability.
Despite their popularity, there are almost no resources in the literature on
how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for
BNs under their most common distributional assumptions. In this paper, we
provide computationally efficient algorithms for both by leveraging BNs'
graphical structure, and we illustrate them with a complete set of numerical
examples. In the process, we show it is possible to reduce the computational
complexity of KL from cubic to quadratic for Gaussian BNs.
( 2
min )
Kernel Stein discrepancies (KSDs) measure the quality of a distributional
approximation and can be computed even when the target density has an
intractable normalizing constant. Notable applications include the diagnosis of
approximate MCMC samplers and goodness-of-fit tests for unnormalized
statistical models. The present work analyzes the convergence control
properties of KSDs. We first show that standard KSDs used for weak convergence
control fail to control moment convergence. To address this limitation, we next
provide sufficient conditions under which alternative diffusion KSDs control
both moment and weak convergence. As an immediate consequence we develop, for
each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein
convergence.
( 2
min )
Distributional Reinforcement Learning (RL) estimates return distribution
mainly by learning quantile values via minimizing the quantile Huber loss
function, entailing a threshold parameter often selected heuristically or via
hyperparameter search, which may not generalize well and can be suboptimal.
This paper introduces a generalized quantile Huber loss function derived from
Wasserstein distance (WD) calculation between Gaussian distributions, capturing
noise in predicted (current) and target (Bellman-updated) quantile values.
Compared to the classical quantile Huber loss, this innovative loss function
enhances robustness against outliers. Notably, the classical Huber loss
function can be seen as an approximation of our proposed loss, enabling
parameter adjustment by approximating the amount of noise in the data during
the learning process. Empirical tests on Atari games, a common application in
distributional RL, and a recent hedging strategy using distributional RL,
validate the effectiveness of our proposed loss function and its potential for
parameter adjustments in distributional RL.
( 2
min )
The Ising model is important in statistical modeling and inference in many
applications, however its normalizing constant, mean number of active vertices
and mean spin interaction -- quantities needed in inference -- are
computationally intractable. We provide accurate approximations that make it
possible to numerically calculate these quantities in the homogeneous case.
Simulation studies indicate good performance of our approximation formulae that
are scalable and unfazed by the size (number of nodes, degree of graph) of the
Markov Random Field. The practical import of our approximation formulae is
illustrated in performing Bayesian inference in a functional Magnetic Resonance
Imaging activation detection experiment, and also in likelihood ratio testing
for anisotropy in the spatial patterns of yearly increases in pistachio tree
yields.
( 2
min )
We introduce a novel sampler called the energy based diffusion generator for
generating samples from arbitrary target distributions. The sampling model
employs a structure similar to a variational autoencoder, utilizing a decoder
to transform latent variables from a simple distribution into random variables
approximating the target distribution, and we design an encoder based on the
diffusion model. Leveraging the powerful modeling capacity of the diffusion
model for complex distributions, we can obtain an accurate variational estimate
of the Kullback-Leibler divergence between the distributions of the generated
samples and the target. Moreover, we propose a decoder based on generalized
Hamiltonian dynamics to further enhance sampling performance. Through empirical
evaluation, we demonstrate the effectiveness of our method across various
complex distribution functions, showcasing its superiority compared to existing
methods.
( 2
min )
In this research, we investigate the structural evolution of the cosmic web,
employing advanced methodologies from Topological Data Analysis. Our approach
involves leveraging $Persistence$ $Signals$, an innovative method from recent
literature that facilitates the embedding of persistence diagrams into vector
spaces by re-conceptualizing them as signals in $\mathbb R^2_+$. Utilizing this
methodology, we analyze three quintessential cosmic structures: clusters,
filaments, and voids. A central discovery is the correlation between
$Persistence$ $Energy$ and redshift values, linking persistent homology with
cosmic evolution and providing insights into the dynamics of cosmic structures.
( 2
min )
This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice. Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models. Data science and DevOps teams may face challenges managing these isolated tool stacks and systems. […]
( 13
min )
Deep generative models have been demonstrated as problematic in the
unsupervised out-of-distribution (OOD) detection task, where they tend to
assign higher likelihoods to OOD samples. Previous studies on this issue are
usually not applicable to the Variational Autoencoder (VAE). As a popular
subclass of generative models, the VAE can be effective with a relatively
smaller model size and be more stable and faster in training and inference,
which can be more advantageous in real-world applications. In this paper, We
propose a novel VAE-based score called Error Reduction (ER) for OOD detection,
which is based on a VAE that takes a lossy version of the training set as
inputs and the original set as targets. Experiments are carried out on various
datasets to show the effectiveness of our method, we also present the effect of
design choices with ablation experiments. Our code is available at:
https://github.com/ZJLAB-AMMI/VAE4OOD.
( 2
min )
Leakages are a major risk in water distribution networks as they cause water
loss and increase contamination risks. Leakage detection is a difficult task
due to the complex dynamics of water distribution networks. In particular,
small leakages are hard to detect. From a machine-learning perspective,
leakages can be modeled as concept drift. Thus, a wide variety of drift
detection schemes seems to be a suitable choice for detecting leakages. In this
work, we explore the potential of model-loss-based and distribution-based drift
detection methods to tackle leakage detection. We additionally discuss the
issue of temporal dependencies in the data and propose a way to cope with it
when applying distribution-based detection. We evaluate different methods
systematically for leakages of different sizes and detection times.
Additionally, we propose a first drift-detection-based technique for localizing
leakages.
( 2
min )
In this paper, we investigate on improving the adversarial robustness
obtained in adversarial training (AT) via reducing the difficulty of
optimization. To better study this problem, we build a novel Bregman divergence
perspective for AT, in which AT can be viewed as the sliding process of the
training data points on the negative entropy curve. Based on this perspective,
we analyze the learning objectives of two typical AT methods, i.e., PGD-AT and
TRADES, and we find that the optimization process of TRADES is easier than
PGD-AT for that TRADES separates PGD-AT. In addition, we discuss the function
of entropy in TRADES, and we find that models with high entropy can be better
robustness learners. Inspired by the above findings, we propose two methods,
i.e., FAIT and MER, which can both not only reduce the difficulty of
optimization under the 10-step PGD adversaries, but also provide better
robustness. Our work suggests that reducing the difficulty of optimization
under the 10-step PGD adversaries is a promising approach for enhancing the
adversarial robustness in AT.
( 2
min )
In many recent works, there is an increased focus on designing algorithms
that seek flatter optima for neural network loss optimization as there is
empirical evidence that it leads to better generalization performance in many
datasets. In this work, we dissect these performance gains through the lens of
data memorization in overparameterized models. We define a new metric that
helps us identify which data points specifically do algorithms seeking flatter
optima do better when compared to vanilla SGD. We find that the generalization
gains achieved by Sharpness Aware Minimization (SAM) are particularly
pronounced for atypical data points, which necessitate memorization. This
insight helps us unearth higher privacy risks associated with SAM, which we
verify through exhaustive empirical evaluations. Finally, we propose mitigation
strategies to achieve a more desirable accuracy vs privacy tradeoff.
( 2
min )
Neural algorithmic reasoners are parallel processors. Teaching them
sequential algorithms contradicts this nature, rendering a significant share of
their computations redundant. Parallel algorithms however may exploit their
full computational power, therefore requiring fewer layers to be executed. This
drastically reduces training times, as we observe when comparing parallel
implementations of searching, sorting and finding strongly connected components
to their sequential counterparts on the CLRS framework. Additionally, parallel
versions achieve (often strongly) superior predictive performance.
( 2
min )
We introduce ensembles of stochastic neural networks to approximate the
Bayesian posterior, combining stochastic methods such as dropout with deep
ensembles. The stochastic ensembles are formulated as families of distributions
and trained to approximate the Bayesian posterior with variational inference.
We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and
a novel non-parametric version of dropout and evaluate them on a toy problem
and CIFAR image classification. For both tasks, we test the quality of the
posteriors directly against Hamiltonian Monte Carlo simulations. Our results
show that stochastic ensembles provide more accurate posterior estimates than
other popular baselines for Bayesian inference.
( 2
min )
Identifying reaction coordinates(RCs) is an active area of research, given
the crucial role RCs play in determining the progress of a chemical reaction.
The choice of the reaction coordinate is often based on heuristic knowledge.
However, an essential criterion for the choice is that the coordinate should
capture both the reactant and product states unequivocally. Also, the
coordinate should be the slowest one so that all the other degrees of freedom
can easily equilibrate along the reaction coordinate. Also, the coordinate
should be the slowest one so that all the other degrees of freedom can easily
equilibrate along the reaction coordinate. We used a regularised sparse
autoencoder, an energy-based model, to discover a crucial set of reaction
coordinates. Along with discovering reaction coordinates, our model also
predicts the evolution of a molecular dynamics(MD) trajectory. We showcased
that including sparsity enforcing regularisation helps in choosing a small but
important set of reaction coordinates. We used two model systems to demonstrate
our approach: alanine dipeptide system and proflavine and DNA system, which
exhibited intercalation of proflavine into DNA minor groove in an aqueous
environment. We model MD trajectory as a multivariate time series, and our
latent variable model performs the task of multi-step time series prediction.
This idea is inspired by the popular sparse coding approach - to represent each
input sample as a linear combination of few elements taken from a set of
representative patterns.
( 3
min )
Artificial intelligence (AI), machine learning, and deep learning (DL)
methods are becoming increasingly important in the field of biomedical image
analysis. However, to exploit the full potential of such methods, a
representative number of experimentally acquired images containing a
significant number of manually annotated objects is needed as training data.
Here we introduce SYNTA (synthetic data) as a novel approach for the generation
of synthetic, photo-realistic, and highly complex biomedical images as training
data for DL systems. We show the versatility of our approach in the context of
muscle fiber and connective tissue analysis in histological sections. We
demonstrate that it is possible to perform robust and expert-level segmentation
tasks on previously unseen real-world data, without the need for manual
annotations using synthetic training data alone. Being a fully parametric
technique, our approach poses an interpretable and controllable alternative to
Generative Adversarial Networks (GANs) and has the potential to significantly
accelerate quantitative image analysis in a variety of biomedical applications
in microscopy and beyond.
( 3
min )
We present a new method that includes three key components of distributed
optimization and federated learning: variance reduction of stochastic
gradients, partial participation, and compressed communication. We prove that
the new method has optimal oracle complexity and state-of-the-art communication
complexity in the partial participation setting. Regardless of the
communication compression feature, our method successfully combines variance
reduction and partial participation: we get the optimal oracle complexity,
never need the participation of all nodes, and do not require the bounded
gradients (dissimilarity) assumption.
( 2
min )
This document outlines some of the common mistakes that occur when using
machine learning, and what can be done to avoid them. Whilst it should be
accessible to anyone with a basic understanding of machine learning techniques,
it was originally written for research students, and focuses on issues that are
of particular concern within academic research, such as the need to do rigorous
comparisons and reach valid conclusions. It covers five stages of the machine
learning process: what to do before model building, how to reliably build
models, how to robustly evaluate models, how to compare models fairly, and how
to report results.
( 2
min )
A simple and effective method for the alignment of generative models is the
best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked
based on a reward function, and the highest ranking one is selected. A commonly
used analytical expression in the literature claims that the KL divergence
between the best-of-$n$ policy and the base policy is equal to $\log (n) -
(n-1)/n.$ We disprove the validity of this claim, and show that it is an upper
bound on the actual KL divergence. We also explore the tightness of this upper
bound in different regimes. Finally, we propose a new estimator for the KL
divergence and empirically show that it provides a tight approximation through
a few examples.
( 2
min )
This work examines the effects of variations in machine learning training
regimes and learning paradigms on the corresponding energy consumption. While
increasing data availability and innovation in high-performance hardware fuels
the training of sophisticated models, it also supports the fading perception of
energy consumption and carbon emission. Therefore, the goal of this work is to
create awareness about the energy impact of general training parameters and
processes, from learning rate over batch size to knowledge transfer. Multiple
setups with different hyperparameter initializations are evaluated on two
different hardware configurations to obtain meaningful results. Experiments on
pretraining and multitask training are conducted on top of the baseline results
to determine their potential towards sustainable machine learning.
( 2
min )
Data augmentation is an effective technique for improving the performance of
machine learning models. However, it has not been explored as extensively in
natural language processing (NLP) as it has in computer vision. In this paper,
we propose a novel text augmentation method that leverages the Fill-Mask
feature of the transformer-based BERT model. Our method involves iteratively
masking words in a sentence and replacing them with language model predictions.
We have tested our proposed method on various NLP tasks and found it to be
effective in many cases. Our results are presented along with a comparison to
existing augmentation methods. Experimental results show that our proposed
method significantly improves performance, especially on topic classification
datasets.
( 2
min )
According to the World Health Organization (WHO), air pollution kills seven
million people every year. Outdoor air pollution is a major environmental
health problem affecting low, middle, and high-income countries. In the past
few years, the research community has explored IoT-enabled machine learning
applications for outdoor air pollution prediction. The general objective of
this paper is to systematically review applications of machine learning and
Internet of Things (IoT) for outdoor air pollution prediction and the
combination of monitoring sensors and input features used. Two research
questions were formulated for this review. 1086 publications were collected in
the initial PRISMA stage. After the screening and eligibility phases, 37 papers
were selected for inclusion. A cost-based analysis was conducted on the
findings to highlight high-cost monitoring, low-cost IoT and hybrid enabled
prediction. Three methods of prediction were identified: time series,
feature-based and spatio-temporal. This review's findings identify major
limitations in applications found in the literature, namely lack of coverage,
lack of diversity of data and lack of inclusion of context-specific features.
This review proposes directions for future research and underlines practical
implications in healthcare, urban planning, global synergy and smart cities.
( 2
min )
Explainability in deep networks has gained increased importance in recent
years. We argue herein that an AI must be tasked not just with a task but also
with an explanation of why said task was accomplished as such. We present a
basic framework -- Task and Explanation Network (TENet) -- which fully
integrates task completion and its explanation. We believe that the field of AI
as a whole should insist -- quite emphatically -- on explainability.
( 2
min )
The generalization error curve of certain kernel regression method aims at
determining the exact order of generalization error with various source
condition, noise level and choice of the regularization parameter rather than
the minimax rate. In this work, under mild assumptions, we rigorously provide a
full characterization of the generalization error curves of the kernel gradient
descent method (and a large class of analytic spectral algorithms) in kernel
regression. Consequently, we could sharpen the near inconsistency of kernel
interpolation and clarify the saturation effects of kernel regression
algorithms with higher qualification, etc. Thanks to the neural tangent kernel
theory, these results greatly improve our understanding of the generalization
behavior of training the wide neural networks. A novel technical contribution,
the analytic functional argument, might be of independent interest.
( 2
min )
When implementing hierarchical federated learning over wireless networks,
scalability assurance and the ability to handle both interference and device
data heterogeneity are crucial. This work introduces a learning method designed
to address these challenges, along with a scalable transmission scheme that
efficiently uses a single wireless resource through over-the-air computation.
To provide resistance against data heterogeneity, we employ gradient
aggregations. Meanwhile, the impact of interference is minimized through
optimized receiver normalizing factors. For this, we model a multi-cluster
wireless network using stochastic geometry, and characterize the mean squared
error of the aggregation estimations as a function of the network parameters.
We show that despite the interference and the data heterogeneity, the proposed
scheme achieves high learning accuracy and can significantly outperform the
conventional hierarchical algorithm.
( 2
min )
In the rapidly evolving field of artificial intelligence, the creation and
utilization of synthetic datasets have become increasingly significant. This
report delves into the multifaceted aspects of synthetic data, particularly
emphasizing the challenges and potential biases these datasets may harbor. It
explores the methodologies behind synthetic data generation, spanning
traditional statistical models to advanced deep learning techniques, and
examines their applications across diverse domains. The report also critically
addresses the ethical considerations and legal implications associated with
synthetic datasets, highlighting the urgent need for mechanisms to ensure
fairness, mitigate biases, and uphold ethical standards in AI development.
( 2
min )
The last decades have been characterized by unprecedented technological
advances, many of them powered by modern technologies such as Artificial
Intelligence (AI) and Machine Learning (ML). The world has become more
digitally connected than ever, but we face major challenges. One of the most
significant is cybercrime, which has emerged as a global threat to governments,
businesses, and civil societies. The pervasiveness of digital technologies
combined with a constantly shifting technological foundation has created a
complex and powerful playground for cybercriminals, which triggered a surge in
demand for intelligent threat detection systems based on machine and deep
learning. This paper investigates AI-based cyber threat detection to protect
our modern digital ecosystems. The primary focus is on evaluating ML-based
classifiers and ensembles for anomaly-based malware detection and network
intrusion detection and how to integrate those models in the context of network
security, mobile security, and IoT security. The discussion highlights the
challenges when deploying and integrating AI-enabled cybersecurity solutions
into existing enterprise systems and IT infrastructures, including options to
overcome those challenges. Finally, the paper provides future research
directions to further increase the security and resilience of our modern
digital industries, infrastructures, and ecosystems.
( 2
min )
Existing object recognition models have been shown to lack robustness in
diverse geographical scenarios due to significant domain shifts in design and
context. Class representations need to be adapted to more accurately reflect an
object concept under these shifts. In the absence of training data from target
geographies, we hypothesize that geography-specific descriptive knowledge of
object categories can be leveraged to enhance robustness. For this purpose, we
explore the feasibility of probing a large-language model for
geography-specific object knowledge, and we investigate integrating knowledge
in zero-shot and learnable soft prompting with the CLIP vision-language model.
In particular, we propose a geography knowledge regularization method to ensure
that soft prompts trained on a source set of geographies generalize to an
unseen target set of geographies. Our gains on DollarStreet when generalizing
from a model trained only on data from Europe are as large as +2.8 on countries
from Africa, and +4.6 on the hardest classes. We further show competitive
performance vs. few-shot target training, and provide insights into how
descriptive knowledge captures geographical differences.
( 2
min )
Defect detection is one of the most important yet challenging tasks in the
quality control stage in the manufacturing sector. In this work, we introduce a
Tensor Convolutional Neural Network (T-CNN) and examine its performance on a
real defect detection application in one of the components of the ultrasonic
sensors produced at Robert Bosch's manufacturing plants. Our quantum-inspired
T-CNN operates on a reduced model parameter space to substantially improve the
training speed and performance of an equivalent CNN model without sacrificing
accuracy. More specifically, we demonstrate how T-CNNs are able to reach the
same performance as classical CNNs as measured by quality metrics, with up to
fifteen times fewer parameters and 4% to 19% faster training times. Our results
demonstrate that the T-CNN greatly outperforms the results of traditional human
visual inspection, providing value in a current real application in
manufacturing.
( 2
min )
Recent research has shown the potential of deep learning in multi-parametric
MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for
training is laborious and time-consuming. Therefore, it is crucial to develop
effective algorithms in situations with limited labeled samples. In this work,
we propose a label-efficient deep learning method with self-ensembling (LESEN).
LESEN incorporates supervised and unsupervised losses, enabling the student and
teacher models to mutually learn from each other, forming a self-ensembling
mean teacher framework. Additionally, we introduce a reliable unlabeled sample
selection (RUSS) mechanism to further enhance LESEN's effectiveness. Our
experiments on the human connectome project (HCP) dataset demonstrate the
superior performance of our method when compared to state-of-the-art
techniques, advancing multimodal VP segmentation for comprehensive analysis in
clinical and research settings. The implementation code will be available at:
https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-
Delineation.
( 2
min )
A new variant of Newton's method - named Backtracking New Q-Newton's method
(BNQN) - which has strong theoretical guarantee, is easy to implement, and has
good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the
basins of attractions for finding roots of polynomials and meromorphic
functions, with BNQN. In general, they look more smooth than that of Newton's
method.
In this paper, we continue to experimentally explore in depth this remarkable
phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link
poses a couple of challenging puzzles to be explained. Experiments also
indicate that BNQN is more robust against random perturbations than Newton's
method and Random Relaxed Newton's method.
( 2
min )
Machine learning models underpin many modern financial systems for use cases
such as fraud detection and churn prediction. Most are based on supervised
learning with hand-engineered features, which relies heavily on the
availability of labelled data. Large self-supervised generative models have
shown tremendous success in natural language processing and computer vision,
yet so far they haven't been adapted to multivariate time series of financial
transactions. In this paper, we present a generative pretraining method that
can be used to obtain contextualised embeddings of financial transactions.
Benchmarks on public datasets demonstrate that it outperforms state-of-the-art
self-supervised methods on a range of downstream tasks. We additionally perform
large-scale pretraining of an embedding model using a corpus of data from 180
issuing banks containing 5.1 billion transactions and apply it to the card
fraud detection problem on hold-out datasets. The embedding model significantly
improves value detection rate at high precision thresholds and transfers well
to out-of-domain distributions.
( 2
min )
We introduce ensembles of stochastic neural networks to approximate the
Bayesian posterior, combining stochastic methods such as dropout with deep
ensembles. The stochastic ensembles are formulated as families of distributions
and trained to approximate the Bayesian posterior with variational inference.
We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and
a novel non-parametric version of dropout and evaluate them on a toy problem
and CIFAR image classification. For both tasks, we test the quality of the
posteriors directly against Hamiltonian Monte Carlo simulations. Our results
show that stochastic ensembles provide more accurate posterior estimates than
other popular baselines for Bayesian inference.
( 2
min )
Identifying reaction coordinates(RCs) is an active area of research, given
the crucial role RCs play in determining the progress of a chemical reaction.
The choice of the reaction coordinate is often based on heuristic knowledge.
However, an essential criterion for the choice is that the coordinate should
capture both the reactant and product states unequivocally. Also, the
coordinate should be the slowest one so that all the other degrees of freedom
can easily equilibrate along the reaction coordinate. Also, the coordinate
should be the slowest one so that all the other degrees of freedom can easily
equilibrate along the reaction coordinate. We used a regularised sparse
autoencoder, an energy-based model, to discover a crucial set of reaction
coordinates. Along with discovering reaction coordinates, our model also
predicts the evolution of a molecular dynamics(MD) trajectory. We showcased
that including sparsity enforcing regularisation helps in choosing a small but
important set of reaction coordinates. We used two model systems to demonstrate
our approach: alanine dipeptide system and proflavine and DNA system, which
exhibited intercalation of proflavine into DNA minor groove in an aqueous
environment. We model MD trajectory as a multivariate time series, and our
latent variable model performs the task of multi-step time series prediction.
This idea is inspired by the popular sparse coding approach - to represent each
input sample as a linear combination of few elements taken from a set of
representative patterns.
( 3
min )
Generative AI has opened up a lot of potential in the field of AI. We are seeing numerous uses, including text generation, code generation, summarization, translation, chatbots, and more. One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing […]
( 10
min )
Expanded LLM use creates new demands on cloud GPU capacity. Splitwise presents an efficient solution by separating the two essential phases of LLM inference, achieving higher throughput within a limited power budget.
The post Splitwise improves GPU usage by splitting LLM inference phases appeared first on Microsoft Research.
( 10
min )
Celebrate the new year with more cloud gaming. Experience the power and performance of the cloud with more than 20 new games to be added to GeForce NOW in January. Start with five games available this week, including The Finals from Embark Studios. And tune in to the NVIDIA Special Address at CES on Monday, Read article >
( 7
min )
Finding a transformation between two unknown probability distributions from
finite samples is crucial for modeling complex data distributions and
performing tasks such as sample generation, domain adaptation and statistical
inference. One powerful framework for such transformations is normalizing flow,
which transforms an unknown distribution into a standard normal distribution
using an invertible network. In this paper, we introduce a novel model called
SyMOT-Flow that trains an invertible transformation by minimizing the symmetric
maximum mean discrepancy between samples from two unknown distributions, and an
optimal transport cost is incorporated as regularization to obtain a
short-distance and interpretable transformation. The resulted transformation
leads to more stable and accurate sample generation. Several theoretical
results are established for the proposed model and its effectiveness is
validated with low-dimensional illustrative examples as well as
high-dimensional bi-modality medical image generation through the forward and
reverse flows.
( 2
min )
Identifying constitutive parameters in engineering and biological materials,
particularly those with intricate geometries and mechanical behaviors, remains
a longstanding challenge. The recent advent of Physics-Informed Neural Networks
(PINNs) offers promising solutions, but current frameworks are often limited to
basic constitutive laws and encounter practical constraints when combined with
experimental data. In this paper, we introduce a robust PINN-based framework
designed to identify material parameters for soft materials, specifically those
exhibiting complex constitutive behaviors, under large deformation in plane
stress conditions. Distinctively, our model emphasizes training PINNs with
multi-modal synthetic experimental datasets consisting of full-field
deformation and loading history, ensuring algorithm robustness even with noisy
data. Our results reveal that the PINNs framework can accurately identify
constitutive parameters of the incompressible Arruda-Boyce model for samples
with intricate geometries, maintaining an error below 5%, even with an
experimental noise level of 5%. We believe our framework provides a robust
modulus identification approach for complex solids, especially for those with
geometrical and constitutive complexity.
( 2
min )
Causal inference is a crucial goal of science, enabling researchers to arrive
at meaningful conclusions regarding the predictions of hypothetical
interventions using observational data. Path models, Structural Equation Models
(SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to
unambiguously specify assumptions regarding the causal structure underlying a
phenomenon. Unlike DAGs, which make very few assumptions about the functional
and parametric form, SEM assumes linearity. This can result in functional
misspecification which prevents researchers from undertaking reliable effect
size estimation. In contrast, we propose Super Learner Equation Modeling, a
path modeling technique integrating machine learning Super Learner ensembles.
We empirically demonstrate its ability to provide consistent and unbiased
estimates of causal effects, its competitive performance for linear models when
compared with SEM, and highlight its superiority over SEM when dealing with
non-linear relationships. We provide open-source code, and a tutorial notebook
with example usage, accentuating the easy-to-use nature of the method.
( 2
min )
How do language models "think"? This paper formulates a probabilistic
cognitive model called the bounded pragmatic speaker, which can characterize
the operation of different variations of language models. Specifically, we
demonstrate that large language models fine-tuned with reinforcement learning
from human feedback (Ouyang et al., 2022) embody a model of thought that
conceptually resembles a fast-and-slow model (Kahneman, 2011), which
psychologists have attributed to humans. We discuss the limitations of
reinforcement learning from human feedback as a fast-and-slow model of thought
and propose avenues for expanding this framework. In essence, our research
highlights the value of adopting a cognitive probabilistic modeling approach to
gain insights into the comprehension, evaluation, and advancement of language
models.
( 2
min )
Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for
learning dynamical systems that can be modelled by ordinary differential
equations. In this paper, we extend the method to partial differential
equations. The resulting model is comprised of up to three neural networks,
modelling terms representing conservation, dissipation and external forces, and
discrete convolution operators that can either be learned or be given as input.
We demonstrate numerically the superior performance of PHNN compared to a
baseline model that models the full dynamics by a single neural network.
Moreover, since the PHNN model consists of three parts with different physical
interpretations, these can be studied separately to gain insight into the
system, and the learned model is applicable also if external forces are removed
or changed.
( 2
min )
In this paper, we propose a novel method for joint entity and relation
extraction from unstructured text by framing it as a conditional sequence
generation problem. In contrast to conventional generative information
extraction models that are left-to-right token-level generators, our approach
is \textit{span-based}. It generates a linearized graph where nodes represent
text spans and edges represent relation triplets. Our method employs a
transformer encoder-decoder architecture with pointing mechanism on a dynamic
vocabulary of spans and relation types. Our model can capture the structural
characteristics and boundaries of entities and relations through span
representations while simultaneously grounding the generated output in the
original text thanks to the pointing mechanism. Evaluation on benchmark
datasets validates the effectiveness of our approach, demonstrating competitive
results. Code is available at https://github.com/urchade/ATG.
( 2
min )
In this work we present deep learning implementations of two popular
theoretical constrained optimization algorithms in infinite dimensional Hilbert
spaces, namely, the penalty and the augmented Lagrangian methods. We test these
algorithms on some toy problems originating in either calculus of variations or
physics. We demonstrate that both methods are able to produce decent
approximations for the test problems and are comparable in terms of different
errors. Leveraging the common occurrence of the Lagrange multiplier update rule
being computationally less expensive than solving subproblems in the penalty
method, we achieve significant speedups in cases when the output of the
constraint function is itself a function.
( 2
min )
Broadband infrastructure owners do not always know how their customers are
connected in the local networks, which are structured as rooted trees. A recent
study is able to infer the topology of a local network using discrete time
series data from the leaves of the tree (customers). In this study we propose a
contrastive approach for learning a binary event encoder from continuous time
series data. As a preliminary result, we show that our approach has some
potential in learning a valuable encoder.
( 2
min )
This paper introduces HAAQI-Net, a non-intrusive deep learning model for
music quality assessment tailored to hearing aid users. In contrast to
traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net
utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It
takes an assessed music sample and a hearing loss pattern as input, generating
a predicted HAAQI score. The model employs the pre-trained Bidirectional
Encoder representation from Audio Transformers (BEATs) for acoustic feature
extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a
Longitudinal Concordance Correlation (LCC) of 0.9257, Spearman's Rank
Correlation Coefficient (SRCC) of 0.9394, and Mean Squared Error (MSE) of
0.0080. Notably, this high performance comes with a substantial reduction in
inference time: from 62.52 seconds (by HAAQI) to 2.71 seconds (by HAAQI-Net),
serving as an efficient music quality assessment model for hearing aid users.
( 2
min )
Modern healthcare often utilises radiographic images alongside textual
reports for diagnostics, encouraging the use of Vision-Language Self-Supervised
Learning (VL-SSL) with large pre-trained models to learn versatile medical
vision representations. However, most existing VL-SSL frameworks are trained
end-to-end, which is computation-heavy and can lose vital prior information
embedded in pre-trained encoders. To address both issues, we introduce the
backbone-agnostic Adaptor framework, which preserves medical knowledge in
pre-trained image and text encoders by keeping them frozen, and employs a
lightweight Adaptor module for cross-modal learning. Experiments on medical
image classification and segmentation tasks across three datasets reveal that
our framework delivers competitive performance while cutting trainable
parameters by over 90% compared to current pre-training approaches. Notably,
when fine-tuned with just 1% of data, Adaptor outperforms several
Transformer-based methods trained on full datasets in medical image
segmentation.
( 2
min )
We present a new high-probability PAC-Bayes oracle bound for unbounded
losses. This result can be understood as a PAC-Bayes version of the Chernoff
bound. The proof technique relies on uniformly bounding the tail of certain
random variable based on the Cram\'er transform of the loss. We highlight two
applications of our main result. First, we show that our bound solves the open
problem of optimizing the free parameter on many PAC-Bayes bounds. Finally, we
show that our approach allows working with flexible assumptions on the loss
function, resulting in novel bounds that generalize previous ones and can be
minimized to obtain Gibbs-like posteriors.
( 2
min )
We present a fast and high-quality codec language model for parallel audio
generation. While SoundStorm, a state-of-the-art parallel audio generation
model, accelerates inference speed compared to autoregressive models, it still
suffers from slow inference due to iterative sampling. To resolve this problem,
we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel
Decoding~(G-IPD) for efficient parallel audio generation. Both the training and
sampling schemes enable the model to synthesize high-quality audio with a small
number of iterations by effectively modeling the group-wise conditional
dependencies. In addition, our model employs a cross-attention-based
architecture to capture the speaker style of the prompt voice and improves
computational efficiency. Experimental results demonstrate that our proposed
model outperforms the baselines in prompt-based audio generation.
( 2
min )
The prediction of rolling bearing lifespan is of significant importance in
industrial production. However, the scarcity of high-quality, full lifecycle
data has been a major constraint in achieving precise predictions. To address
this challenge, this paper introduces the CVGAN model, a novel framework
capable of generating one-dimensional vibration signals in both horizontal and
vertical directions, conditioned on historical vibration data and remaining
useful life. In addition, we propose an autoregressive generation method that
can iteratively utilize previously generated vibration information to guide the
generation of current signals. The effectiveness of the CVGAN model is
validated through experiments conducted on the PHM 2012 dataset. Our findings
demonstrate that the CVGAN model, in terms of both MMD and FID metrics,
outperforms many advanced methods in both autoregressive and non-autoregressive
generation modes. Notably, training using the full lifecycle data generated by
the CVGAN model significantly improves the performance of the predictive model.
This result highlights the effectiveness of the data generated by CVGans in
enhancing the predictive power of these models.
( 2
min )
Natural policy gradient (NPG) and its variants are widely-used policy search
methods in reinforcement learning. Inspired by prior work, a new NPG variant
coined NPG-HM is developed in this paper, which utilizes the Hessian-aided
momentum technique for variance reduction, while the sub-problem is solved via
the stochastic gradient descent method. It is shown that NPG-HM can achieve the
global last iterate $\epsilon$-optimality with a sample complexity of
$\mathcal{O}(\epsilon^{-2})$, which is the best known result for natural policy
gradient type methods under the generic Fisher non-degenerate policy
parameterizations. The convergence analysis is built upon a relaxed weak
gradient dominance property tailored for NPG under the compatible function
approximation framework, as well as a neat way to decompose the error when
handling the sub-problem. Moreover, numerical experiments on Mujoco-based
environments demonstrate the superior performance of NPG-HM over other
state-of-the-art policy gradient methods.
( 2
min )
Exploring methods and techniques of machine learning (ML) to address specific
challenges in various fields is essential. In this work, we tackle a problem in
the domain of Cheminformatics; that is, providing a suitable solution to aid in
predicting the activity of a chemical compound to the best extent possible. To
address the problem at hand, this study conducts experiments on 100 different
combinations of existing techniques. These solutions are then selected based on
a set of criteria that includes the G-means, F1-score, and AUC metrics. The
results have been tested on a dataset of about 10,000 chemical compounds from
PubChem that have been classified according to their activity
( 2
min )
Suicide is recognized as one of the most serious concerns in the modern
society. Suicide causes tragedy that affects countries, communities, and
families. There are many factors that lead to suicidal ideations. Early
detection of suicidal ideations can help to prevent suicide occurrence by
providing the victim with the required professional support, especially when
the victim does not recognize the danger of having suicidal ideations. As
technology usage has increased, people share and express their ideations
digitally via social media, chatbots, and other digital platforms. In this
paper, we proposed a novel, simple deep learning-based model to detect suicidal
ideations in digital content, mainly focusing on chatbots as the primary data
source. In addition, we provide a framework that employs the proposed suicide
detection integration with a chatbot-based support system.
( 2
min )
The primary goal of this project is to develop privacy-preserving machine
learning model training techniques for fNIRS data. This project will build a
local model in a centralized setting with both differential privacy (DP) and
certified robustness. It will also explore collaborative federated learning to
train a shared model between multiple clients without sharing local fNIRS
datasets. To prevent unintentional private information leakage of such clients'
private datasets, we will also implement DP in the federated learning setting.
( 2
min )
Exploring generative model training for synthetic tabular data, specifically
in sequential contexts such as credit card transaction data, presents
significant challenges. This paper addresses these challenges, focusing on
attaining both high fidelity to actual data and optimal utility for machine
learning tasks. We introduce five pre-processing schemas to enhance the
training of the Conditional Probabilistic Auto-Regressive Model (CPAR),
demonstrating incremental improvements in the synthetic data's fidelity and
utility. Upon achieving satisfactory fidelity levels, our attention shifts to
training fraud detection models tailored for time-series data, evaluating the
utility of the synthetic data. Our findings offer valuable insights and
practical guidelines for synthetic data practitioners in the finance sector,
transitioning from real to synthetic datasets for training purposes, and
illuminating broader methodologies for synthesizing credit card transaction
time series.
( 2
min )
In this paper, we present an unsupervised approach for frequency sub-band
allocation in wireless networks using graph-based learning. We consider a dense
deployment of subnetworks in the factory environment with a limited number of
sub-bands which must be optimally allocated to coordinate inter-subnetwork
interference. We model the subnetwork deployment as a conflict graph and
propose an unsupervised learning approach inspired by the graph colouring
heuristic and the Potts model to optimize the sub-band allocation using graph
neural networks. The numerical evaluation shows that the proposed method
achieves close performance to the centralized greedy colouring sub-band
allocation heuristic with lower computational time complexity. In addition, it
incurs reduced signalling overhead compared to iterative optimization
heuristics that require all the mutual interfering channel information. We
further demonstrate that the method is robust to different network settings.
( 2
min )
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an
XGBoost binary classification model predicting the Intensive Care Unit (ICU)
length of stay (LOS). Highlighting the critical role of the ICU in managing
critically ill patients, the study addresses the growing strain on ICU
capacity. It emphasizes the significance of LOS prediction for resource
allocation. The research reveals class imbalances in the dataset across
demographic attributes and employs data preprocessing and feature extraction.
While the XGBoost model performs well overall, disparities across race and
insurance attributes reflect the need for tailored assessments and continuous
monitoring. The paper concludes with recommendations for fairness-aware machine
learning techniques for mitigating biases and the need for collaborative
efforts among healthcare professionals and data scientists.
( 2
min )
Leukemia is one of the most common and death-threatening types of cancer that
threaten human life. Medical data from some of the patient's critical
parameters contain valuable information hidden among these data. On this
subject, deep learning can be used to extract this information. In this paper,
AutoEncoders have been used to develop valuable features to help the precision
of leukemia diagnosis. It has been attempted to get the best activation
function and optimizer to use in AutoEncoder and designed the best architecture
for this neural network. The proposed architecture is compared with this area's
classical machine learning models. Our proposed method performs better than
other machine learning in precision and f1-score metrics by more than 11%.
( 2
min )
Reservoir computing is a machine learning technique which has been shown to
be able to replicate the chaotic attractor, including the fractal dimension and
the entire Lyapunov spectrum, of the dynamical system on which it is trained.
We quantitatively relate the generalized synchronization dynamics of a driven
reservoir computer during the training stage to the performance of the
autonomous reservoir computer at the attractor reconstruction task. We show
that, for successful attractor reconstruction and Lyapunov exponent estimation,
the largest conditional Lyapunov exponent of the driven reservoir must be
significantly smaller (more negative) than the smallest (most negative)
Lyapunov exponent of the true system. We find that the maximal conditional
Lyapunov exponent of the reservoir depends strongly on the spectral radius of
the reservoir adjacency matrix, and therefore, for attractor reconstruction and
Lyapunov exponent estimation, small spectral radius reservoir computers perform
better in general. Our arguments are supported by numerical examples on
well-known chaotic systems.
( 2
min )
In this paper we show how tensor networks help in developing explainability
of machine learning algorithms. Specifically, we develop an unsupervised
clustering algorithm based on Matrix Product States (MPS) and apply it in the
context of a real use-case of adversary-generated threat intelligence. Our
investigation proves that MPS rival traditional deep learning models such as
autoencoders and GANs in terms of performance, while providing much richer
model interpretability. Our approach naturally facilitates the extraction of
feature-wise probabilities, Von Neumann Entropy, and mutual information,
offering a compelling narrative for classification of anomalies and fostering
an unprecedented level of transparency and interpretability, something
fundamental to understand the rationale behind artificial intelligence
decisions.
( 2
min )
Causal inference is a crucial goal of science, enabling researchers to arrive
at meaningful conclusions regarding the predictions of hypothetical
interventions using observational data. Path models, Structural Equation Models
(SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to
unambiguously specify assumptions regarding the causal structure underlying a
phenomenon. Unlike DAGs, which make very few assumptions about the functional
and parametric form, SEM assumes linearity. This can result in functional
misspecification which prevents researchers from undertaking reliable effect
size estimation. In contrast, we propose Super Learner Equation Modeling, a
path modeling technique integrating machine learning Super Learner ensembles.
We empirically demonstrate its ability to provide consistent and unbiased
estimates of causal effects, its competitive performance for linear models when
compared with SEM, and highlight its superiority over SEM when dealing with
non-linear relationships. We provide open-source code, and a tutorial notebook
with example usage, accentuating the easy-to-use nature of the method.
( 2
min )
Broadband infrastructure owners do not always know how their customers are
connected in the local networks, which are structured as rooted trees. A recent
study is able to infer the topology of a local network using discrete time
series data from the leaves of the tree (customers). In this study we propose a
contrastive approach for learning a binary event encoder from continuous time
series data. As a preliminary result, we show that our approach has some
potential in learning a valuable encoder.
( 2
min )
We present a new high-probability PAC-Bayes oracle bound for unbounded
losses. This result can be understood as a PAC-Bayes version of the Chernoff
bound. The proof technique relies on uniformly bounding the tail of certain
random variable based on the Cram\'er transform of the loss. We highlight two
applications of our main result. First, we show that our bound solves the open
problem of optimizing the free parameter on many PAC-Bayes bounds. Finally, we
show that our approach allows working with flexible assumptions on the loss
function, resulting in novel bounds that generalize previous ones and can be
minimized to obtain Gibbs-like posteriors.
( 2
min )
MIT researchers introduce a method that uses artificial intelligence to automate the explanation of complex neural networks.
( 11
min )
A new study finds that language regions in the left hemisphere light up when reading uncommon sentences, while straightforward sentences elicit little response.
( 9
min )
Artificial Intelligence (AI) has been around for many decades but now it has become a buzzword even among non-technical people because of the generative AI models like ChatGPT, Bard, Scribe, Claude, DALL·E 2, and a lot more. AI has moved beyond its sci-fi origins to reality, creating human-like content and powering self-driving cars. However, despite… Read More »Mitigating Ethical Risks in Generative AI: Strategies for a Safe and Secure AI Application
The post Mitigating Ethical Risks in Generative AI: Strategies for a Safe and Secure AI Application appeared first on Data Science Central.
( 21
min )
Inventory Routing Problem (IRP) is a crucial challenge in supply chain
management as it involves optimizing efficient route selection while
considering the uncertainty of inventory demand planning. To solve IRPs,
usually a two-stage approach is employed, where demand is predicted using
machine learning techniques first, and then an optimization algorithm is used
to minimize routing costs. Our experiment shows machine learning models fall
short of achieving perfect accuracy because inventory levels are influenced by
the dynamic business environment, which, in turn, affects the optimization
problem in the next stage, resulting in sub-optimal decisions. In this paper,
we formulate and propose a decision-focused learning-based approach to solving
real-world IRPs. This approach directly integrates inventory prediction and
routing optimization within an end-to-end system potentially ensuring a robust
supply chain strategy.
( 2
min )
We analyze a stochastic approximation algorithm for decision-dependent
problems, wherein the data distribution used by the algorithm evolves along the
iterate sequence. The primary examples of such problems appear in performative
prediction and its multiplayer extensions. We show that under mild assumptions,
the deviation between the average iterate of the algorithm and the solution is
asymptotically normal, with a covariance that clearly decouples the effects of
the gradient noise and the distributional shift. Moreover, building on the work
of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm
with averaging is locally minimax optimal.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. Symmetric structured image dataset of three-span beam bridge, arch
bridge, cable-stayed bridge and suspension bridge are used . Based on Python
programming language, TensorFlow and Keras deep learning platform framework ,
as well as Wasserstein loss function and Lipschitz constraints, generative
adversarial network is constructed and trained. From the obtained low
dimensional bridge-type latent space sampling, new bridge types with asymmetric
structures can be generated. Generative adversarial network can create new
bridge types by organically combining different structural components on the
basis of human original bridge types. It has a certain degree of human original
ability. Generative artificial intelligence technology can open up imagination
space and inspire humanity.
( 2
min )
Despite the efficient market hypothesis, many studies suggest the existence
of inefficiencies in the stock market leading to the development of techniques
to gain above-market returns. Systematic trading has undergone significant
advances in recent decades with deep learning schemes emerging as a powerful
tool for analyzing and predicting market behavior. In this paper, a method is
proposed that is inspired by how professional technical analysts trade. This
scheme looks at stock prices of the previous 600 days and predicts whether the
stock price will rise or fall 10% or 20% within the next D days. Plus, the
proposed method uses the Resnet's (a deep learning model) skip connections and
logits to increase the probability of the prediction. The model was trained and
tested using historical data from both the Korean and US stock markets. We show
that using the period label of 5 gives the best result. On Korea market it
achieved a profit more than 39% above the market return, and a profit more than
40% above the market return on the US market.
( 2
min )
In the arena of privacy-preserving machine learning, differentially private
stochastic gradient descent (DP-SGD) has outstripped the objective perturbation
mechanism in popularity and interest. Though unrivaled in versatility, DP-SGD
requires a non-trivial privacy overhead (for privately tuning the model's
hyperparameters) and a computational complexity which might be extravagant for
simple models such as linear and logistic regression. This paper revamps the
objective perturbation mechanism with tighter privacy analyses and new
computational tools that boost it to perform competitively with DP-SGD on
unconstrained convex generalized linear problems.
( 2
min )
In the intricate architecture of the mammalian central nervous system,
neurons form populations. Axonal bundles communicate between these clusters
using spike trains. However, these neuron populations' precise encoding and
operations have yet to be discovered. In our analysis, the starting point is a
state-of-the-art mechanistic model of a generic neuron endowed with plasticity.
From this simple framework emerges a subtle mathematical construct: The
representation and manipulation of information can be precisely characterized
by an algebra of convex cones. Furthermore, these neuron populations are not
merely passive transmitters. They act as operators within this algebraic
structure, mirroring the functionality of a low-level programming language.
When these populations interconnect, they embody succinct yet potent algebraic
expressions. These networks allow them to implement many operations, such as
specialization, generalization, novelty detection, dimensionality reduction,
inverse modeling, prediction, and associative memory. In broader terms, this
work illuminates the potential of matrix embeddings in advancing our
understanding in fields like cognitive science and AI. These embeddings enhance
the capacity for concept processing and hierarchical description over their
vector counterparts.
( 3
min )
Deep learning for Hamiltonian regression of quantum systems in material
research necessitates satisfying the covariance laws, among which achieving
SO(3)-equivariance without sacrificing the expressiveness of networks remains
an elusive challenge due to the restriction to non-linear mappings on
guaranteeing theoretical equivariance. To alleviate the
covariance-expressiveness dilemma, we propose a hybrid framework with two
cascaded regression stages. The first stage, with a theoretically-guaranteed
covariant neural network modeling symmetry properties of 3D atom systems,
yields theoretically covariant features and baseline Hamiltonian predictions,
assisting the second stage in learning covariance. Meanwhile, the second stage,
powered by a non-linear 3D graph Transformer network we propose for structural
modeling of 3D atomic systems, refines the first stage's output as a
fine-grained prediction of Hamiltonians with better expressiveness capability.
The combination of a theoretically covariant yet inevitably less expressive
model with a highly expressive non-linear network enables precise,
generalizable predictions while maintaining robust covariance under coordinate
transformations. Our method achieves state-of-the-art performance in
Hamiltonian prediction for electronic structure calculations, confirmed through
experiments on five crystalline material databases.
( 2
min )
In continual learning from demonstration (CLfD), a robot learns a sequence of
real-world motion skills continually from human demonstrations. Recently,
hypernetworks have been successful in solving this problem. In this paper, we
perform an exploratory study of the effects of different optimizers,
initializers, and network architectures on the continual learning performance
of hypernetworks for CLfD. Our results show that adaptive learning rate
optimizers work well, but initializers specially designed for hypernetworks
offer no advantages for CLfD. We also show that hypernetworks that are capable
of stable trajectory predictions are robust to different network architectures.
Our open-source code is available at
https://github.com/sebastianbergner/ExploringCLFD.
( 2
min )
The tasks of designing messenger RNAs and non-coding RNAs are discrete
optimization problems, and several versions of these problems are NP-hard. As
an alternative to commonly used local search methods, we formulate these
problems as continuous optimization and develop a general framework for this
optimization based on a new concept of "expected partition function". The basic
idea is to start with a distribution over all possible candidate sequences, and
extend the objective function from a sequence to a distribution. We then use
gradient descent-based optimization methods to improve the extended objective
function, and the distribution will gradually shrink towards a one-hot sequence
(i.e., a single sequence). We consider two important case studies within this
framework, the mRNA design problem optimizing for partition function (i.e.,
ensemble free energy) and the non-coding RNA design problem optimizing for
conditional (i.e., Boltzmann) probability. In both cases, our approach
demonstrate promising preliminary results. We make our code available at
https://github.com/KuNyaa/RNA_Design_codebase.
( 2
min )
We introduce a graph-aware autoencoder ensemble framework, with associated
formalisms and tooling, designed to facilitate deep learning for scholarship in
the humanities. By composing sub-architectures to produce a model isomorphic to
a humanistic domain we maintain interpretability while providing function
signatures for each sub-architectural choice, allowing both traditional and
computational researchers to collaborate without disrupting established
practices. We illustrate a practical application of our approach to a
historical study of the American post-Atlantic slave trade, and make several
specific technical contributions: a novel hybrid graph-convolutional
autoencoder mechanism, batching policies for common graph topologies, and
masking techniques for particular use-cases. The effectiveness of the framework
for broadening participation of diverse domains is demonstrated by a growing
suite of two dozen studies, both collaborations with humanists and established
tasks from machine learning literature, spanning a variety of fields and data
modalities. We make performance comparisons of several different architectural
choices and conclude with an ambitious list of imminent next steps for this
research.
( 2
min )
Experimental particle physics uses machine learning for many of tasks, where
one application is to classify signal and background events. The classification
can be used to bin an analysis region to enhance the expected significance for
a mass resonance search. In natural language processing, one of the leading
neural network architectures is the transformer. In this work, an event
classifier transformer is proposed to bin an analysis region, in which the
network is trained with special techniques. The techniques developed here can
enhance the significance and reduce the correlation between the network's
output and the reconstructed mass. It is found that this trained network can
perform better than boosted decision trees and feed-forward networks.
( 2
min )
One among several advantages of measure transport methods is that they allow
for a unified framework for processing and analysis of data distributed
according to a wide class of probability measures. Within this context, we
present results from computational studies aimed at assessing the potential of
measure transport techniques, specifically, the use of triangular transport
maps, as part of a workflow intended to support research in the biological
sciences. Scarce data scenarios, which are common in domains such as radiation
biology, are of particular interest. We find that when data is scarce, sparse
transport maps are advantageous. In particular, statistics gathered from
computing series of (sparse) adaptive transport maps, trained on a series of
randomly chosen subsets of the set of available data samples, leads to
uncovering information hidden in the data. As a result, in the radiation
biology application considered here, this approach provides a tool for
generating hypotheses about gene relationships and their dynamics under
radiation exposure.
( 2
min )
We develop a new efficient sequential approximate leverage score algorithm,
SALSA, using methods from randomized numerical linear algebra (RandNLA) for
large matrices. We demonstrate that, with high probability, the accuracy of
SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage
scores. In addition, we show that the theoretical computational complexity and
numerical accuracy of SALSA surpass existing approximations. These theoretical
results are subsequently utilized to develop an efficient algorithm, named
LSARMA, for fitting an appropriate ARMA model to large-scale time series data.
Our proposed algorithm is, with high probability, guaranteed to find the
maximum likelihood estimates of the parameters for the true underlying ARMA
model. Furthermore, it has a worst-case running time that significantly
improves those of the state-of-the-art alternatives in big data regimes.
Empirical results on large-scale data strongly support these theoretical
results and underscore the efficacy of our new approach.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. The grayscale images of the bridge facade with the change of
component width was rendered by 3dsMax animation software, and then the OpenCV
module performed an appropriate amount of geometric transformation (rotation,
horizontal scale, vertical scale) to obtain the image dataset of three-span
beam bridge, arch bridge, cable-stayed bridge and suspension bridge. Based on
Python programming language, TensorFlow and Keras deep learning platform
framework, variational autoencoder was constructed and trained, and
low-dimensional bridge-type latent space that is convenient for vector
operations was obtained. Variational autoencoder can combine two bridge types
on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in
bridge-type innovation, and can be used as copilot.
( 2
min )
While Hopfield networks are known as paradigmatic models for memory storage
and retrieval, modern artificial intelligence systems mainly stand on the
machine learning paradigm. We show that it is possible to formulate a
teacher-student self-supervised learning problem with Boltzmann machines in
terms of a suitable generalization of the Hopfield model with structured
patterns, where the spin variables are the machine weights and patterns
correspond to the training set's examples. We analyze the learning performance
by studying the phase diagram in terms of the training set size, the dataset
noise and the inference temperature (i.e. the weight regularization). With a
small but informative dataset the machine can learn by memorization. With a
noisy dataset, an extensive number of examples above a critical threshold is
needed. In this regime the memory storage limits of the system becomes an
opportunity for the occurrence of a learning regime in which the system can
generalize.
( 2
min )
This work focuses on plant leaf disease classification and explores three
crucial aspects: adversarial training, model explainability, and model
compression. The models' robustness against adversarial attacks is enhanced
through adversarial training, ensuring accurate classification even in the
presence of threats. Leveraging explainability techniques, we gain insights
into the model's decision-making process, improving trust and transparency.
Additionally, we explore model compression techniques to optimize computational
efficiency while maintaining classification performance. Through our
experiments, we determine that on a benchmark dataset, the robustness can be
the price of the classification accuracy with performance reductions of 3%-20%
for regular tests and gains of 50%-70% for adversarial attack tests. We also
demonstrate that a student model can be 15-25 times more computationally
efficient for a slight performance reduction, distilling the knowledge of more
complex models.
( 2
min )
In this paper, we propose a novel and general framework to construct tight
framelet systems on graphs with localized supports based on hierarchical
partitions. Our construction provides parametrized graph framelet systems with
great generality based on partition trees, by which we are able to find the
size of a low-dimensional subspace that best fits the low-rank structure of a
family of signals. The orthogonal decomposition of subspaces provides a key
ingredient for the definition of "generalized vanishing moments" for graph
framelets. In a data-adaptive setting, the graph framelet systems can be
learned by solving an optimization problem on Stiefel manifolds with respect to
our parameterization. Moreover, such graph framelet systems can be further
improved by solving a subsequent optimization problem on Stiefel manifolds,
aiming at providing the utmost sparsity for a given family of graph signals.
Experimental results show that our learned graph framelet systems perform
superiorly in non-linear approximation and denoising tasks.
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate; one thereby obtains an a priori stopping time for
any prescribed proximity to the global minimum. We point out relations of the
latter to sub-Riemannian geometry.
( 2
min )
Magnetic particle imaging (MPI) is an emerging medical imaging modality which
has gained increasing interest in recent years. Among the benefits of MPI are
its high temporal resolution, and that the technique does not expose the
specimen to any kind of ionizing radiation. It is based on the non-linear
response of magnetic nanoparticles to an applied magnetic field. From the
electric signal measured in receive coils, the particle concentration has to be
reconstructed. Due to the ill-posedness of the reconstruction problem, various
regularization methods have been proposed for reconstruction ranging from early
stopping methods, via classical Tikhonov regularization and iterative methods
to modern machine learning approaches. In this work, we contribute to the
latter class: we propose a plug-and-play approach based on a generic zero-shot
denoiser with an $\ell^1$-prior. Moreover, we develop parameter selection
strategies. Finally, we quantitatively and qualitatively evaluate the proposed
algorithmic scheme on the 3D Open MPI data set with different levels of
preprocessing.
( 3
min )
Combinatorial Optimization (CO) problems over graphs appear routinely in many
applications such as in optimizing traffic, viral marketing in social networks,
and matching for job allocation. Due to their combinatorial nature, these
problems are often NP-hard. Existing approximation algorithms and heuristics
rely on the search space to find the solutions and become time-consuming when
this space is large. In this paper, we design a neural method called COMBHelper
to reduce this space and thus improve the efficiency of the traditional CO
algorithms based on node selection. Specifically, it employs a Graph Neural
Network (GNN) to identify promising nodes for the solution set. This pruned
search space is then fed to the traditional CO algorithms. COMBHelper also uses
a Knowledge Distillation (KD) module and a problem-specific boosting module to
bring further efficiency and efficacy. Our extensive experiments show that the
traditional CO algorithms with COMBHelper are at least 2 times faster than
their original versions.
( 2
min )
The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is
rooted in the diverse phases and the crystal structures they contain. In the
realm of material informatics, employing machine learning (ML) techniques to
classify phases and crystal structures of HEAs has gained considerable
significance. In this study, we assembled a new collection of 1345 HEAs with
varying compositions to predict phases. Within this collection, there were 705
sets of data that were utilized to predict the crystal structures with the help
of thermodynamics and electronic configuration. Our study introduces a
methodical framework i.e., the Pearson correlation coefficient that helps in
selecting the strongly co-related features to increase the prediction accuracy.
This study employed five distinct boosting algorithms to predict phases and
crystal structures, offering an enhanced guideline for improving the accuracy
of these predictions. Among all these algorithms, XGBoost gives the highest
accuracy of prediction (94.05%) for phases and LightGBM gives the highest
accuracy of prediction of crystal structure of the phases (90.07%). The
quantification of the influence exerted by parameters on the model's accuracy
was conducted and a new approach was made to elucidate the contribution of
individual parameters in the process of phase prediction and crystal structure
prediction.
( 3
min )
Accumulated Local Effects (ALE) is a model-agnostic approach for global
explanations of the results of black-box machine learning (ML) algorithms.
There are at least three challenges with conducting statistical inference based
on ALE: ensuring the reliability of ALE analyses, especially in the context of
small datasets; intuitively characterizing a variable's overall effect in ML;
and making robust inferences from ML data analysis. In response, we introduce
innovative tools and techniques for statistical inference using ALE,
establishing bootstrapped confidence intervals tailored to dataset size and
introducing ALE effect size measures that intuitively indicate effects on both
the outcome variable scale and a normalized scale. Furthermore, we demonstrate
how to use these tools to draw reliable statistical inferences, reflecting the
flexible patterns ALE adeptly highlights, with implementations available in the
'ale' package in R. This work propels the discourse on ALE and its
applicability in ML and statistical analysis forward, offering practical
solutions to prevailing challenges in the field.
( 3
min )
We study a principal component analysis problem under the spiked Wishart
model in which the structure in the signal is captured by a class of
union-of-subspace models. This general class includes vanilla sparse PCA as
well as its variants with graph sparsity. With the goal of studying these
problems under a unified statistical and computational lens, we establish
fundamental limits that depend on the geometry of the problem instance, and
show that a natural projected power method exhibits local convergence to the
statistically near-optimal neighborhood of the solution. We complement these
results with end-to-end analyses of two important special cases given by path
and tree sparsity in a general basis, showing initialization methods and
matching evidence of computational hardness. Overall, our results indicate that
several of the phenomena observed for vanilla sparse PCA extend in a natural
fashion to its structured counterparts.
( 2
min )
Self-supervised and language-supervised image models contain rich knowledge
of the world that is important for generalization. Many robotic tasks, however,
require a detailed understanding of 3D geometry, which is often lacking in 2D
image features. This work bridges this 2D-to-3D gap for robotic manipulation by
leveraging distilled feature fields to combine accurate 3D geometry with rich
semantics from 2D foundation models. We present a few-shot learning method for
6-DOF grasping and placing that harnesses these strong spatial and semantic
priors to achieve in-the-wild generalization to unseen objects. Using features
distilled from a vision-language model, CLIP, we present a way to designate
novel objects for manipulation via free-text natural language, and demonstrate
its ability to generalize to unseen expressions and novel categories of
objects.
( 2
min )
Federated Learning (FL) is a machine-learning approach enabling collaborative
model training across multiple decentralized edge devices that hold local data
samples, all without exchanging these samples. This collaborative process
occurs under the supervision of a central server orchestrating the training or
via a peer-to-peer network. The significance of FL is particularly pronounced
in industries such as healthcare and finance, where data privacy holds
paramount importance. However, training a model under the Federated learning
setting brings forth several challenges, with one of the most prominent being
the heterogeneity of data distribution among the edge devices. The data is
typically non-independently and non-identically distributed (non-IID), thereby
presenting challenges to model convergence. This report delves into the issues
arising from non-IID and heterogeneous data and explores current algorithms
designed to address these challenges.
( 2
min )
Deep Learning models have shown success in a large variety of tasks by
extracting correlation patterns from high-dimensional data but still struggle
when generalizing out of their initial distribution. As causal engines aim to
learn mechanisms independent from a data distribution, combining Deep Learning
with Causality can have a great impact on the two fields. In this paper, we
further motivate this assumption. We perform an extensive overview of the
theories and methods for Causality from different perspectives, with an
emphasis on Deep Learning and the challenges met by the two domains. We show
early attempts to bring the fields together and the possible perspectives for
the future. We finish by providing a large variety of applications for
techniques from Causality.
( 2
min )
There is a recent interest on first-order methods for linear programming
(LP). In this paper,we propose a stochastic algorithm using variance reduction
and restarts for solving sharp primal-dual problems such as LP. We show that
the proposed stochastic method exhibits a linear convergence rate for solving
sharp instances with a high probability. In addition, we propose an efficient
coordinate-based stochastic oracle for unconstrained bilinear problems, which
has $\mathcal O(1)$ per iteration cost and improves the complexity of the
existing deterministic and stochastic algorithms. Finally, we show that the
obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a
wide class of stochastic primal dual methods.
( 2
min )
Group equivariant non-expansive operators have been recently proposed as
basic components in topological data analysis and deep learning. In this paper
we study some geometric properties of the spaces of group equivariant operators
and show how a space $\mathcal{F}$ of group equivariant non-expansive operators
can be endowed with the structure of a Riemannian manifold, so making available
the use of gradient descent methods for the minimization of cost functions on
$\mathcal{F}$. As an application of this approach, we also describe a procedure
to select a finite set of representative group equivariant non-expansive
operators in the considered manifold.
( 2
min )
We present an elementary yet general proof of duality for Wasserstein
distributionally robust optimization. The duality holds for any arbitrary
Kantorovich transport cost, measurable loss function, and nominal probability
distribution, provided that an interchangeability principle holds, which is
equivalent to certain measurability conditions. To illustrate the broader
applicability of our approach, we provide a rigorous treatment of duality
results in distributionally robust Markov decision processes and
distributionally robust multistage stochastic programming. Furthermore, we
extend the result to other problems including infinity-Wasserstein
distributionally robust optimization, risk-averse optimization, and globalized
distributionally robust counterpart.
( 2
min )
Multiscale stochastic dynamical systems have been widely adopted to a variety
of scientific and engineering problems due to their capability of depicting
complex phenomena in many real world applications. This work is devoted to
investigating the effective dynamics for slow-fast stochastic dynamical
systems. Given observation data on a short-term period satisfying some unknown
slow-fast stochastic systems, we propose a novel algorithm including a neural
network called Auto-SDE to learn invariant slow manifold. Our approach captures
the evolutionary nature of a series of time-dependent autoencoder neural
networks with the loss constructed from a discretized stochastic differential
equation. Our algorithm is also validated to be accurate, stable and effective
through numerical experiments under various evaluation metrics.
( 2
min )
In this paper we will discuss metalearning and how we can go beyond the
current classical learning paradigm. We will first address the importance of
inductive biases in the learning process and what is at stake: the quantities
of data necessary to learn. We will subsequently see the importance of choosing
suitable parameterizations to end up with well-defined learning processes.
Especially since in the context of real-world applications, we face numerous
biases due, e.g., to the specificities of sensors, the heterogeneity of data
sources, the multiplicity of points of view, etc. This will lead us to the idea
of exploiting the structuring of the concepts to be learned in order to
organize the learning process that we published previously. We conclude by
discussing the perspectives around parameter-tying schemes and the emergence of
universal aspects in the models thus learned.
( 2
min )
Large language model (LLM) scaling laws are empirical formulas that estimate
changes in model quality as a result of increasing parameter count and training
data. However, these formulas, including the popular DeepMind Chinchilla
scaling laws, neglect to include the cost of inference. We modify the
Chinchilla scaling laws to calculate the optimal LLM parameter count and
pre-training data size to train and deploy a model of a given quality and
inference demand. We conduct our analysis both in terms of a compute budget and
real-world costs and find that LLM researchers expecting reasonably large
inference demand (~1B requests) should train models smaller and longer than
Chinchilla-optimal.
( 2
min )
In this work, we consider the offline preference-based reinforcement learning
problem. We focus on the two-phase learning approach that is prevalent in
previous reinforcement learning from human preference works. We find a
challenge in applying two-phase learning in the offline PBRL setting that the
learned utility model can be too hard for the learning agent to optimize during
the second learning phase. To overcome the challenge, we propose a two-phasing
learning approach under behavior regularization through action clipping. The
insight is that the state-actions which are poorly covered by the dataset can
only provide limited information and increase the complexity of the problem in
the second learning phase. Our method ignores such state-actions during the
second learning phase to achieve higher learning efficiency. We empirically
verify that our method has high learning efficiency on a variety of datasets in
robotic control environments.
( 2
min )
Decision-making is a dynamic process requiring perception, memory, and
reasoning to make choices and find optimal policies. Traditional approaches to
decision-making suffer from sample efficiency and generalization, while
large-scale self-supervised pretraining has enabled fast adaptation with
fine-tuning or few-shot learning in language and vision. We thus argue to
integrate knowledge acquired from generic large-scale self-supervised
pretraining into downstream decision-making problems. We propose
Pretrain-Then-Adapt pipeline and survey recent work on data collection,
pretraining objectives and adaptation strategies for decision-making
pretraining and downstream inference. Finally, we identify critical challenges
and future directions for developing decision foundation model with the help of
generic and flexible self-supervised pretraining.
( 2
min )
In the realm of cryptocurrency, the prediction of Bitcoin prices has garnered
substantial attention due to its potential impact on financial markets and
investment strategies. This paper propose a comparative study on hybrid machine
learning algorithms and leverage on enhancing model interpretability.
Specifically, linear regression(OLS, LASSO), long-short term memory(LSTM),
decision tree regressors are introduced. Through the grounded experiments, we
observe linear regressor achieves the best performance among candidate models.
For the interpretability, we carry out a systematic overview on the
preprocessing techniques of time-series statistics, including decomposition,
auto-correlational function, exponential triple forecasting, which aim to
excavate latent relations and complex patterns appeared in the financial
time-series forecasting. We believe this work may derive more attention and
inspire more researches in the realm of time-series analysis and its realistic
applications.
( 2
min )
This paper introduces an iterative algorithm designed to train additive
models with favorable memory storage and computational requirements. The
algorithm can be viewed as the functional counterpart of stochastic gradient
descent, applied to the coefficients of a truncated basis expansion of the
component functions. We show that the resulting estimator satisfies an oracle
inequality that allows for model mispecification. In the well-specified
setting, by choosing the learning rate carefully across three distinct stages
of training, we prove that its risk is minimax optimal in terms of the
dependence on the dimensionality of the data and the size of the training
sample.
( 2
min )
Growth in the penetration of renewable energy sources makes supply more
uncertain and leads to an increase in the system imbalance. This trend,
together with the single imbalance pricing, opens an opportunity for balance
responsible parties (BRPs) to perform energy arbitrage in the imbalance
settlement mechanism. To this end, we propose a battery control framework based
on distributional reinforcement learning (DRL). Our proposed control framework
takes a risk-sensitive perspective, allowing BRPs to adjust their risk
preferences: we aim to optimize a weighted sum of the arbitrage profit and a
risk measure while constraining the daily number of cycles for the battery. We
assess the performance of our proposed control framework using the Belgian
imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q
learning and soft actor-critic. Results reveal that the distributional soft
actor-critic method can outperform other methods. Moreover, we note that our
fully risk-averse agent appropriately learns to hedge against the risk related
to the unknown imbalance price by (dis)charging the battery only when the agent
is more certain about the price.
( 2
min )
We analyze a stochastic approximation algorithm for decision-dependent
problems, wherein the data distribution used by the algorithm evolves along the
iterate sequence. The primary examples of such problems appear in performative
prediction and its multiplayer extensions. We show that under mild assumptions,
the deviation between the average iterate of the algorithm and the solution is
asymptotically normal, with a covariance that clearly decouples the effects of
the gradient noise and the distributional shift. Moreover, building on the work
of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm
with averaging is locally minimax optimal.
( 2
min )
This paper introduces an iterative algorithm designed to train additive
models with favorable memory storage and computational requirements. The
algorithm can be viewed as the functional counterpart of stochastic gradient
descent, applied to the coefficients of a truncated basis expansion of the
component functions. We show that the resulting estimator satisfies an oracle
inequality that allows for model mispecification. In the well-specified
setting, by choosing the learning rate carefully across three distinct stages
of training, we prove that its risk is minimax optimal in terms of the
dependence on the dimensionality of the data and the size of the training
sample.
( 2
min )
We develop a new efficient sequential approximate leverage score algorithm,
SALSA, using methods from randomized numerical linear algebra (RandNLA) for
large matrices. We demonstrate that, with high probability, the accuracy of
SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage
scores. In addition, we show that the theoretical computational complexity and
numerical accuracy of SALSA surpass existing approximations. These theoretical
results are subsequently utilized to develop an efficient algorithm, named
LSARMA, for fitting an appropriate ARMA model to large-scale time series data.
Our proposed algorithm is, with high probability, guaranteed to find the
maximum likelihood estimates of the parameters for the true underlying ARMA
model. Furthermore, it has a worst-case running time that significantly
improves those of the state-of-the-art alternatives in big data regimes.
Empirical results on large-scale data strongly support these theoretical
results and underscore the efficacy of our new approach.
( 2
min )
Inventory Routing Problem (IRP) is a crucial challenge in supply chain
management as it involves optimizing efficient route selection while
considering the uncertainty of inventory demand planning. To solve IRPs,
usually a two-stage approach is employed, where demand is predicted using
machine learning techniques first, and then an optimization algorithm is used
to minimize routing costs. Our experiment shows machine learning models fall
short of achieving perfect accuracy because inventory levels are influenced by
the dynamic business environment, which, in turn, affects the optimization
problem in the next stage, resulting in sub-optimal decisions. In this paper,
we formulate and propose a decision-focused learning-based approach to solving
real-world IRPs. This approach directly integrates inventory prediction and
routing optimization within an end-to-end system potentially ensuring a robust
supply chain strategy.
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate; one thereby obtains an a priori stopping time for
any prescribed proximity to the global minimum. We point out relations of the
latter to sub-Riemannian geometry.
( 2
min )
We study a principal component analysis problem under the spiked Wishart
model in which the structure in the signal is captured by a class of
union-of-subspace models. This general class includes vanilla sparse PCA as
well as its variants with graph sparsity. With the goal of studying these
problems under a unified statistical and computational lens, we establish
fundamental limits that depend on the geometry of the problem instance, and
show that a natural projected power method exhibits local convergence to the
statistically near-optimal neighborhood of the solution. We complement these
results with end-to-end analyses of two important special cases given by path
and tree sparsity in a general basis, showing initialization methods and
matching evidence of computational hardness. Overall, our results indicate that
several of the phenomena observed for vanilla sparse PCA extend in a natural
fashion to its structured counterparts.
( 2
min )
Multiscale stochastic dynamical systems have been widely adopted to a variety
of scientific and engineering problems due to their capability of depicting
complex phenomena in many real world applications. This work is devoted to
investigating the effective dynamics for slow-fast stochastic dynamical
systems. Given observation data on a short-term period satisfying some unknown
slow-fast stochastic systems, we propose a novel algorithm including a neural
network called Auto-SDE to learn invariant slow manifold. Our approach captures
the evolutionary nature of a series of time-dependent autoencoder neural
networks with the loss constructed from a discretized stochastic differential
equation. Our algorithm is also validated to be accurate, stable and effective
through numerical experiments under various evaluation metrics.
( 2
min )
We present an elementary yet general proof of duality for Wasserstein
distributionally robust optimization. The duality holds for any arbitrary
Kantorovich transport cost, measurable loss function, and nominal probability
distribution, provided that an interchangeability principle holds, which is
equivalent to certain measurability conditions. To illustrate the broader
applicability of our approach, we provide a rigorous treatment of duality
results in distributionally robust Markov decision processes and
distributionally robust multistage stochastic programming. Furthermore, we
extend the result to other problems including infinity-Wasserstein
distributionally robust optimization, risk-averse optimization, and globalized
distributionally robust counterpart.
( 2
min )
2024 will be all about changing business models due to the massive disruption of generative AI. There will be new winners and many losers. The incumbents especially have a lot to lose – but permissionless innovation has always been the hallmark of American innovation. We see the usual vanguard action from the incumbents who find… Read More »Generative AI business model disruption: The NYT lawsuit posturing
The post Generative AI business model disruption: The NYT lawsuit posturing appeared first on Data Science Central.
( 20
min )
An MIT panel charts how artificial intelligence will impact art and design.
( 10
min )
An avid cyclist, Thomas Park knows the value of having lots of gears to maintain a smooth, fast ride. So, when the software architect designed an AI inference platform to serve predictions for Oracle Cloud Infrastructure’s (OCI) Vision AI service, he picked NVIDIA Triton Inference Server. That’s because it can shift up, down or sideways Read article >
( 6
min )
A new year means new creative opportunities and new In the NVIDIA Studio beats.
( 7
min )
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion
parameters, demonstrating superior performance in various natural language
processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale
LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which
encompasses depthwise scaling and continued pretraining. In contrast to other
LLM up-scaling methods that use mixture-of-experts, DUS does not require
complex changes to train and inference efficiently. We show experimentally that
DUS is simple yet effective in scaling up high-performance LLMs from small
ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct,
a variant fine-tuned for instruction-following capabilities, surpassing
Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0
license, promoting broad access and application in the LLM field.
( 2
min )
The Earth mover's distance (EMD) is a useful metric for image recognition and
classification, but its usual implementations are not differentiable or too
slow to be used as a loss function for training other algorithms via gradient
descent. In this paper, we train a convolutional neural network (CNN) to learn
a differentiable, fast approximation of the EMD and demonstrate that it can be
used as a substitute for computing-intensive EMD implementations. We apply this
differentiable approximation in the training of an autoencoder-inspired neural
network (encoder NN) for data compression at the high-luminosity LHC at CERN.
The goal of this encoder NN is to compress the data while preserving the
information related to the distribution of energy deposits in particle
detectors. We demonstrate that the performance of our encoder NN trained using
the differentiable EMD CNN surpasses that of training with loss functions based
on mean squared error.
( 3
min )
Sound event detection (SED), as a core module of acoustic environmental
analysis, suffers from the problem of data deficiency. The integration of
semi-supervised learning (SSL) largely mitigates such problem while bringing no
extra annotation budget. This paper researches on several core modules of SSL,
and introduces a random consistency training (RCT) strategy. First, a
self-consistency loss is proposed to fuse with the teacher-student model to
stabilize the training. Second, a hard mixup data augmentation is proposed to
account for the additive property of sounds. Third, a random augmentation
scheme is applied to flexibly combine different types of data augmentations.
Experiments show that the proposed strategy outperform other widely-used
strategies.
( 2
min )
Semi-supervised learning (SSL) approaches have been successfully applied in a
wide range of engineering and scientific fields. This paper investigates the
generative model framework with a missingness mechanism for unclassified
observations, as introduced by Ahfock and McLachlan(2020). We show that in a
partially classified sample, a classifier using Bayes rule of allocation with a
missing-data mechanism can surpass a fully supervised classifier in a two-class
normal homoscedastic model, especially with moderate to low overlap and
proportion of missing class labels, or with large overlap but few missing
labels. It also outperforms a classifier with no missing-data mechanism
regardless of the overlap region or the proportion of missing class labels. Our
exploration of two- and three-component normal mixture models with unequal
covariances through simulations further corroborates our findings. Finally, we
illustrate the use of the proposed classifier with a missing-data mechanism on
interneuronal and skin lesion datasets.
( 2
min )
This paper introduces AIJack, an open-source library designed to assess
security and privacy risks associated with the training and deployment of
machine learning models. Amid the growing interest in big data and AI,
advancements in machine learning research and business are accelerating.
However, recent studies reveal potential threats, such as the theft of training
data and the manipulation of models by malicious attackers. Therefore, a
comprehensive understanding of machine learning's security and privacy
vulnerabilities is crucial for the safe integration of machine learning into
real-world products. AIJack aims to address this need by providing a library
with various attack and defense methods through a unified API. The library is
publicly available on GitHub (https://github.com/Koukyosyumei/AIJack).
( 2
min )
The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular
subspaces of a matrix change when subjected to a small perturbation. This
classic result is sharp in the worst case scenario. In this paper, we prove a
stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the
perturbation is a Gaussian random matrix. Under certain structural assumptions,
we obtain an optimal bound that significantly improves upon the classic
Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new
perturbation bound for the singular values, which may be of independent
interest.
( 2
min )
This paper presents a systematic literature review (SLR) on the
explainability and interpretability of machine learning (ML) models within the
context of predictive process mining, using the PRISMA framework. Given the
rapid advancement of artificial intelligence (AI) and ML systems, understanding
the "black-box" nature of these technologies has become increasingly critical.
Focusing specifically on the domain of process mining, this paper delves into
the challenges of interpreting ML models trained with complex business process
data. We differentiate between intrinsically interpretable models and those
that require post-hoc explanation techniques, providing a comprehensive
overview of the current methodologies and their applications across various
application domains. Through a rigorous bibliographic analysis, this research
offers a detailed synthesis of the state of explainability and interpretability
in predictive process mining, identifying key trends, challenges, and future
directions. Our findings aim to equip researchers and practitioners with a
deeper understanding of how to develop and implement more trustworthy,
transparent, and effective intelligent systems for predictive process
analytics.
( 2
min )
Time series forecasting plays a crucial role in diverse fields, necessitating
the development of robust models that can effectively handle complex temporal
patterns. In this article, we present a novel feature selection method embedded
in Long Short-Term Memory networks, leveraging a multi-objective evolutionary
algorithm. Our approach optimizes the weights and biases of the LSTM in a
partitioned manner, with each objective function of the evolutionary algorithm
targeting the root mean square error in a specific data partition. The set of
non-dominated forecast models identified by the algorithm is then utilized to
construct a meta-model through stacking-based ensemble learning. Furthermore,
our proposed method provides an avenue for attribute importance determination,
as the frequency of selection for each attribute in the set of non-dominated
forecasting models reflects their significance. This attribute importance
insight adds an interpretable dimension to the forecasting process.
Experimental evaluations on air quality time series data from Italy and
southeast Spain demonstrate that our method substantially improves the
generalization ability of conventional LSTMs, effectively reducing overfitting.
Comparative analyses against state-of-the-art CancelOut and EAR-FS methods
highlight the superior performance of our approach.
( 2
min )
Approximate Computing (AxC) techniques have become increasingly popular in
trading off accuracy for performance gains in various applications. Selecting
the best AxC techniques for a given application is challenging. Among proposed
approaches for exploring the design space, Machine Learning approaches such as
Reinforcement Learning (RL) show promising results. In this paper, we proposed
an RL-based multi-objective Design Space Exploration strategy to find the
approximate versions of the application that balance accuracy degradation and
power and computation time reduction. Our experimental results show a good
trade-off between accuracy degradation and decreased power and computation time
for some benchmarks.
( 2
min )
Online display advertising platforms service numerous advertisers by
providing real-time bidding (RTB) for the scale of billions of ad requests
every day. The bidding strategy handles ad requests cross multiple channels to
maximize the number of clicks under the set financial constraints, i.e., total
budget and cost-per-click (CPC), etc. Different from existing works mainly
focusing on single channel bidding, we explicitly consider cross-channel
constrained bidding with budget allocation. Specifically, we propose a
hierarchical offline deep reinforcement learning (DRL) framework called
``HiBid'', consisted of a high-level planner equipped with auxiliary loss for
non-competitive budget allocation, and a data augmentation enhanced low-level
executor for adaptive bidding strategy in response to allocated budgets.
Additionally, a CPC-guided action selection mechanism is introduced to satisfy
the cross-channel CPC constraint. Through extensive experiments on both the
large-scale log data and online A/B testing, we confirm that HiBid outperforms
six baselines in terms of the number of clicks, CPC satisfactory ratio, and
return-on-investment (ROI). We also deploy HiBid on Meituan advertising
platform to already service tens of thousands of advertisers every day.
( 2
min )
In many real-world problems, there is a limited set of training data, but an
abundance of unlabeled data. We propose a new method, Generative Posterior
Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in
high-dimensional problems. A GPN is a generative model that, given a prior
distribution over functions, approximates the posterior distribution directly
by regularizing the network towards samples from the prior. We prove
theoretically that our method indeed approximates the Bayesian posterior and
show empirically that it improves epistemic uncertainty estimation and
scalability over competing methods.
( 2
min )
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam,
AdaGrad, and AdaHessian utilize a preconditioner that modifies the search
direction by incorporating information about the curvature of the objective
function. However, despite their adaptive characteristics, these methods still
require manual fine-tuning of the step-size. This, in turn, impacts the time
required to solve a particular problem. This paper presents an optimization
framework named SANIA to tackle these challenges. Beyond eliminating the need
for manual step-size hyperparameter settings, SANIA incorporates techniques to
address poorly scaled or ill-conditioned problems. We also explore several
preconditioning methods, including Hutchinson's method, which approximates the
Hessian diagonal of the loss function. We conclude with an extensive empirical
examination of the proposed techniques across classification tasks, covering
both convex and non-convex contexts.
( 2
min )
This paper introduces Auto-modeling of Formal Verification with Real-world
Prompting for 5G and NextG protocols (AVRE), a novel system designed for the
formal verification of Next Generation (NextG) communication protocols,
addressing the increasing complexity and scalability challenges in network
protocol design and verification. Utilizing Large Language Models (LLMs), AVRE
transforms protocol descriptions into dependency graphs and formal models,
efficiently resolving ambiguities and capturing design intent. The system
integrates a transformer model with LLMs to autonomously establish quantifiable
dependency relationships through cross- and self-attention mechanisms. Enhanced
by iterative feedback from the HyFuzz experimental platform, AVRE significantly
advances the accuracy and relevance of formal verification in complex
communication protocols, offering a groundbreaking approach to validating
sophisticated communication systems. We compare CAL's performance with
state-of-the-art LLM-based models and traditional time sequence models,
demonstrating its superiority in accuracy and robustness, achieving an accuracy
of 95.94\% and an AUC of 0.98. This NLP-based approach enables, for the first
time, the creation of exploits directly from design documents, making
remarkable progress in scalable system verification and validation.
( 2
min )
Although database systems perform well in data access and manipulation, their
relational model hinders data scientists from formulating machine learning
algorithms in SQL. Nevertheless, we argue that modern database systems perform
well for machine learning algorithms expressed in relational algebra. To
overcome the barrier of the relational model, this paper shows how to transform
data into a relational representation for training neural networks in SQL: We
first describe building blocks for data transformation, model training and
inference in SQL-92 and their counterparts using an extended array data type.
Then, we compare the implementation for model training and inference using
array data types to the one using a relational representation in SQL-92 only.
The evaluation in terms of runtime and memory consumption proves the
suitability of modern database systems for matrix algebra, although specialised
array data types perform better than matrices in relational representation.
( 2
min )
Understanding intermediate representations of the concepts learned by deep
learning classifiers is indispensable for interpreting general model behaviors.
Existing approaches to reveal learned concepts often rely on human supervision,
such as pre-defined concept sets or segmentation processes. In this paper, we
propose a novel unsupervised method for discovering distributed representations
of concepts by selecting a principal subset of neurons. Our empirical findings
demonstrate that instances with similar neuron activation states tend to share
coherent concepts. Based on the observations, the proposed method selects
principal neurons that construct an interpretable region, namely a Relaxed
Decision Region (RDR), encompassing instances with coherent concepts in the
feature space. It can be utilized to identify unlabeled subclasses within data
and to detect the causes of misclassifications. Furthermore, the applicability
of our method across various layers discloses distinct distributed
representations over the layers, which provides deeper insights into the
internal mechanisms of the deep learning model.
( 2
min )
In healthcare, patient data is often collected as multivariate time series,
providing a comprehensive view of a patient's health status over time. While
this data can be sparse, connected devices may enhance its frequency. The goal
is to create patient profiles from these time series. In the absence of labels,
a predictive model can be used to predict future values while forming a latent
cluster space, evaluated based on predictive performance. We compare two models
on Withing's datasets, M AGMAC LUST which clusters entire time series and
DGM${}^2$ which allows the group affiliation of an individual to change over
time (dynamic clustering).
( 2
min )
In this research, we introduce RefineNet, a novel architecture designed to
address resolution limitations in text-to-image conversion systems. We explore
the challenges of generating high-resolution images from textual descriptions,
focusing on the trade-offs between detail accuracy and computational
efficiency. RefineNet leverages a hierarchical Transformer combined with
progressive and conditional refinement techniques, outperforming existing
models in producing detailed and high-quality images. Through extensive
experiments on diverse datasets, we demonstrate RefineNet's superiority in
clarity and resolution, particularly in complex image categories like animals,
plants, and human faces. Our work not only advances the field of image-to-text
conversion but also opens new avenues for high-fidelity image generation in
various applications.
( 2
min )
Engineering system design, viewed as a decision-making process, faces
challenges due to complexity and uncertainty. In this paper, we present a
framework proposing the use of the Deep Q-learning algorithm to optimize the
design of engineering systems. We outline a step-by-step framework for
optimizing engineering system designs. The goal is to find policies that
maximize the output of a simulation model given multiple sources of
uncertainties. The proposed algorithm handles linear and non-linear multi-stage
stochastic problems, where decision variables are discrete, and the objective
function and constraints are assessed via a Monte Carlo simulation. We
demonstrate the effectiveness of our proposed framework by solving two
engineering system design problems in the presence of multiple uncertainties,
such as price and demand.
( 2
min )
This work proposes $\mu$GUIDE: a general Bayesian framework to estimate
posterior distributions of tissue microstructure parameters from any given
biophysical model or MRI signal representation, with exemplar demonstration in
diffusion-weighted MRI. Harnessing a new deep learning architecture for
automatic signal feature selection combined with simulation-based inference and
efficient sampling of the posterior distributions, $\mu$GUIDE bypasses the high
computational and time cost of conventional Bayesian approaches and does not
rely on acquisition constraints to define model-specific summary statistics.
The obtained posterior distributions allow to highlight degeneracies present in
the model definition and quantify the uncertainty and ambiguity of the
estimated parameters.
( 2
min )
To plan and optimize energy storage demands that account for Li-ion battery
aging dynamics, techniques need to be developed to diagnose battery internal
states accurately and rapidly. This study seeks to reduce the computational
resources needed to determine a battery's internal states by replacing
physics-based Li-ion battery models -- such as the single-particle model (SPM)
and the pseudo-2D (P2D) model -- with a physics-informed neural network (PINN)
surrogate. The surrogate model makes high-throughput techniques, such as
Bayesian calibration, tractable to determine battery internal parameters from
voltage responses. This manuscript is the first of a two-part series that
introduces PINN surrogates of Li-ion battery models for parameter inference
(i.e., state-of-health diagnostics). In this first part, a method is presented
for constructing a PINN surrogate of the SPM. A multi-fidelity hierarchical
training, where several neural nets are trained with multiple physics-loss
fidelities is shown to significantly improve the surrogate accuracy when only
training on the governing equation residuals. The implementation is made
available in a companion repository (https://github.com/NREL/pinnstripes). The
techniques used to develop a PINN surrogate of the SPM are extended in Part II
for the PINN surrogate for the P2D battery model, and explore the Bayesian
calibration capabilities of both surrogates.
( 3
min )
Domain generalization focuses on leveraging knowledge from multiple related
domains with ample training data and labels to enhance inference on unseen
in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we
introduce a two-phase representation learning technique using multi-task
learning. This approach aims to cultivate a latent space from features spanning
multiple domains, encompassing both native and cross-domains, to amplify
generalization to IN and OOD territories. Additionally, we attempt to
disentangle the latent space by minimizing the mutual information between the
prior and latent space, effectively de-correlating spurious feature
correlations. Collectively, the joint optimization will facilitate
domain-invariant feature learning. We assess the model's efficacy across
multiple cybersecurity datasets, using standard classification metrics on both
unseen IN and OOD sets, and juxtapose the results with contemporary domain
generalization methods.
( 2
min )
Despite the success of graph neural networks (GNNs) in various domains, they
exhibit susceptibility to adversarial attacks. Understanding these
vulnerabilities is crucial for developing robust and secure applications. In
this paper, we investigate the impact of test time adversarial attacks through
edge perturbations which involve both edge insertions and deletions. A novel
explainability-based method is proposed to identify important nodes in the
graph and perform edge perturbation between these nodes. The proposed method is
tested for node classification with three different architectures and datasets.
The results suggest that introducing edges between nodes of different classes
has higher impact as compared to removing edges among nodes within the same
class.
( 2
min )
Bayesian parameter inference is useful to improve Li-ion battery diagnostics
and can help formulate battery aging models. However, it is computationally
intensive and cannot be easily repeated for multiple cycles, multiple operating
conditions, or multiple replicate cells. To reduce the computational cost of
Bayesian calibration, numerical solvers for physics-based models can be
replaced with faster surrogates. A physics-informed neural network (PINN) is
developed as a surrogate for the pseudo-2D (P2D) battery model calibration. For
the P2D surrogate, additional training regularization was needed as compared to
the PINN single-particle model (SPM) developed in Part I. Both the PINN SPM and
P2D surrogate models are exercised for parameter inference and compared to data
obtained from a direct numerical solution of the governing equations. A
parameter inference study highlights the ability to use these PINNs to
calibrate scaling parameters for the cathode Li diffusion and the anode
exchange current density. By realizing computational speed-ups of 2250x for the
P2D model, as compared to using standard integrating methods, the PINN
surrogates enable rapid state-of-health diagnostics. In the low-data
availability scenario, the testing error was estimated to 2mV for the SPM
surrogate and 10mV for the P2D surrogate which could be mitigated with
additional data.
( 3
min )
Generalization remains a major problem in supervised learning of
single-channel speech enhancement. In this work, we propose learnable loss
mixup (LLM), a simple and effortless training diagram, to improve the
generalization of deep learning-based speech enhancement models. Loss mixup, of
which learnable loss mixup is a special variant, optimizes a mixture of the
loss functions of random sample pairs to train a model on virtual training data
constructed from these pairs of samples. In learnable loss mixup, by
conditioning on the mixed data, the loss functions are mixed using a non-linear
mixing function automatically learned via neural parameterization. Our
experimental results on the VCTK benchmark show that learnable loss mixup
achieves 3.26 PESQ, outperforming the state-of-the-art.
( 2
min )
Machine learning and data mining techniques are utiized for enhancement of
the security of any network. Researchers used machine learning for pattern
detection, anomaly detection, dynamic policy setting, etc. The methods allow
the program to learn from data and make decisions without human intervention,
consuming a huge training period and computation power. This paper discusses a
novel technique to predict an upcoming attack in a network based on several
data parameters. The dataset is continuous in real-time implementation. The
proposed model comprises dataset pre-processing, and training, followed by the
testing phase. Based on the results of the testing phase, the best model is
selected using which, event class which may lead to an attack is extracted. The
event statistics are used for attack
( 2
min )
The 0/1 matrix factorization defines matrix products using logical AND and OR
as product-sum operators, revealing the factors influencing various decision
processes. Instances and their characteristics are arranged in rows and
columns. Formulating matrix factorization as an energy minimization problem and
exploring it with Simulated Annealing (SA) theoretically enables finding a
minimum solution in sufficient time. However, searching for the optimal
solution in practical time becomes problematic when the energy landscape has
many plateaus with flat slopes. In this work, we propose a method to facilitate
the solution process by applying a gradient to the energy landscape, using a
rectified linear type cost function readily available in modern annealing
machines. We also propose a method to quickly obtain a solution by updating the
cost function's gradient during the search process. Numerical experiments were
conducted, confirming the method's effectiveness with both noise-free
artificial and real data.
( 2
min )
This paper describes a machine learning method to automate reading of cockpit
gauges, using a CNN to invert affine transformations and deduce aircraft states
from instrument images. Validated with synthetic images of a turn-and-bank
indicator, this research introduces methods such as generating datasets from a
single image, the 'Clean Training Principle' for optimal noise-free training,
and CNN interpolation for continuous value predictions from categorical data.
It also offers insights into hyperparameter optimization and ML system software
engineering.
( 2
min )
Personalized Federated Learning (PFL) relies on collective data knowledge to
build customized models. However, non-IID data between clients poses
significant challenges, as collaborating with clients who have diverse data
distributions can harm local model performance, especially with limited
training data. To address this issue, we propose FedACS, a new PFL algorithm
with an Attention-based Client Selection mechanism. FedACS integrates an
attention mechanism to enhance collaboration among clients with similar data
distributions and mitigate the data scarcity issue. It prioritizes and
allocates resources based on data similarity. We further establish the
theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST
validate FedACS's superiority, showcasing its potential to advance personalized
federated learning. By tackling non-IID data challenges and data scarcity,
FedACS offers promising advances in the field of personalized federated
learning.
( 2
min )
Semi-supervised learning (SSL) approaches have been successfully applied in a
wide range of engineering and scientific fields. This paper investigates the
generative model framework with a missingness mechanism for unclassified
observations, as introduced by Ahfock and McLachlan(2020). We show that in a
partially classified sample, a classifier using Bayes rule of allocation with a
missing-data mechanism can surpass a fully supervised classifier in a two-class
normal homoscedastic model, especially with moderate to low overlap and
proportion of missing class labels, or with large overlap but few missing
labels. It also outperforms a classifier with no missing-data mechanism
regardless of the overlap region or the proportion of missing class labels. Our
exploration of two- and three-component normal mixture models with unequal
covariances through simulations further corroborates our findings. Finally, we
illustrate the use of the proposed classifier with a missing-data mechanism on
interneuronal and skin lesion datasets.
( 2
min )
The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular
subspaces of a matrix change when subjected to a small perturbation. This
classic result is sharp in the worst case scenario. In this paper, we prove a
stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the
perturbation is a Gaussian random matrix. Under certain structural assumptions,
we obtain an optimal bound that significantly improves upon the classic
Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new
perturbation bound for the singular values, which may be of independent
interest.
( 2
min )
This paper presents a systematic literature review (SLR) on the
explainability and interpretability of machine learning (ML) models within the
context of predictive process mining, using the PRISMA framework. Given the
rapid advancement of artificial intelligence (AI) and ML systems, understanding
the "black-box" nature of these technologies has become increasingly critical.
Focusing specifically on the domain of process mining, this paper delves into
the challenges of interpreting ML models trained with complex business process
data. We differentiate between intrinsically interpretable models and those
that require post-hoc explanation techniques, providing a comprehensive
overview of the current methodologies and their applications across various
application domains. Through a rigorous bibliographic analysis, this research
offers a detailed synthesis of the state of explainability and interpretability
in predictive process mining, identifying key trends, challenges, and future
directions. Our findings aim to equip researchers and practitioners with a
deeper understanding of how to develop and implement more trustworthy,
transparent, and effective intelligent systems for predictive process
analytics.
( 2
min )
In many real-world problems, there is a limited set of training data, but an
abundance of unlabeled data. We propose a new method, Generative Posterior
Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in
high-dimensional problems. A GPN is a generative model that, given a prior
distribution over functions, approximates the posterior distribution directly
by regularizing the network towards samples from the prior. We prove
theoretically that our method indeed approximates the Bayesian posterior and
show empirically that it improves epistemic uncertainty estimation and
scalability over competing methods.
( 2
min )
It’s not technology advancements that are the game-changers. The game-changer is how those technological advancements are leveraged to economically transform industries and society. 2024 is going to be a big year, especially in the realm of Artificial Intelligence (AI). Generative AI (GenAI) has lit a fire under organizations that suddenly have a senior management and… Read More »GenAI: Beware the Productivity Trap; It’s About Economics – Part 1
The post GenAI: Beware the Productivity Trap; It’s About Economics – Part 1 appeared first on Data Science Central.
( 22
min )
We propose StyleCap, a method to generate natural language descriptions of
speaking styles appearing in speech. Although most of conventional techniques
for para-/non-linguistic information recognition focus on the category
classification or the intensity estimation of pre-defined labels, they cannot
provide the reasoning of the recognition result in an interpretable manner.
StyleCap is a first step towards an end-to-end method for generating
speaking-style prompts from speech, i.e., automatic speaking-style captioning.
StyleCap is trained with paired data of speech and natural language
descriptions. We train neural networks that convert a speech representation
vector into prefix vectors that are fed into a large language model (LLM)-based
text decoder. We explore an appropriate text decoder and speech feature
representation suitable for this new task. The experimental results demonstrate
that our StyleCap leveraging richer LLMs for the text decoder, speech
self-supervised learning (SSL) features, and sentence rephrasing augmentation
improves the accuracy and diversity of generated speaking-style captions.
Samples of speaking-style captions generated by our StyleCap are publicly
available.
( 2
min )
In this note, we consider the highly nonconvex optimization problem
associated with computing the rank decomposition of symmetric tensors. We
formulate the invariance properties of the loss function and show that critical
points detected by standard gradient based methods are \emph{symmetry breaking}
with respect to the target tensor. The phenomena, seen for different choices of
target tensors and norms, make possible the use of recently developed analytic
and algebraic tools for studying nonconvex optimization landscapes exhibiting
symmetry breaking phenomena of similar nature.
( 2
min )
Despite the breakthroughs in biomarker discovery facilitated by differential
gene analysis, challenges remain, particularly at the single-cell level.
Traditional methodologies heavily rely on user-supplied cell annotations,
focusing on individually expressed data, often neglecting the critical
interactions between biological conditions, such as healthy versus diseased
states. In response, here we introduce scBeacon, an innovative framework built
upon a deep contrastive siamese network. scBeacon pioneers an unsupervised
approach, adeptly identifying matched cell populations across varied
conditions, enabling a refined differential gene analysis. By utilizing a
VQ-VAE framework, a contrastive siamese network, and a greedy iterative
strategy, scBeacon effectively pinpoints differential genes that hold potential
as key biomarkers. Comprehensive evaluations on a diverse array of datasets
validate scBeacon's superiority over existing single-cell differential gene
analysis tools. Its precision and adaptability underscore its significant role
in enhancing diagnostic accuracy in biomarker discovery. With the emphasis on
the importance of biomarkers in diagnosis, scBeacon is positioned to be a
pivotal asset in the evolution of personalized medicine and targeted
treatments.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
To benefit from the modeling capacity of deep models in system
identification, without worrying about inference time, this study presents a
novel training strategy that uses deep models only at the training stage. For
this purpose two separate models with different structures and goals are
employed. The first one is a deep generative model aiming at modeling the
distribution of system output(s), called the teacher model, and the second one
is a shallow basis function model, named the student model, fed by system
input(s) to predict the system output(s). That means these isolated paths must
reach the same ultimate target. As deep models show a great performance in
modeling of highly nonlinear systems, aligning the representation space learned
by these two models make the student model to inherit the approximation power
of the teacher model. The proposed objective function consists of the objective
of each student and teacher model adding up with a distance penalty between the
learned latent representations. The simulation results on three nonlinear
benchmarks show a comparative performance with examined deep architectures
applied on the same benchmarks. Algorithmic transparency and structure
efficiency are also achieved as byproducts.
( 3
min )
This report summarizes the 4th International Verification of Neural Networks
Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal
Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with
the 35th International Conference on Computer-Aided Verification (CAV).
VNN-COMP is held annually to facilitate the fair and objective comparison of
state-of-the-art neural network verification tools, encourage the
standardization of tool interfaces, and bring together the neural network
verification community. To this end, standardized formats for networks (ONNX)
and specification (VNN-LIB) were defined, tools were evaluated on equal-cost
hardware (using an automatic evaluation pipeline based on AWS instances), and
tool parameters were chosen by the participants before the final test sets were
made public. In the 2023 iteration, 7 teams participated on a diverse set of 10
scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks,
participating tools, results, and lessons learned from this iteration of this
competition.
( 2
min )
Though there has been substantial progress in developing quantum algorithms
to study classical datasets, the cost of simply \textit{loading} classical data
is an obstacle to quantum advantage. When the amplitude encoding is used,
loading an arbitrary classical vector requires up to exponential circuit depths
with respect to the number of qubits. Here, we address this ``input problem''
with two contributions. First, we introduce a circuit compilation method based
on tensor network (TN) theory. Our method -- AMLET (Automatic Multi-layer
Loader Exploiting TNs) -- proceeds via careful construction of a specific TN
topology and can be tailored to arbitrary circuit depths. Second, we perform
numerical experiments on real-world classical data from four distinct areas:
finance, images, fluid mechanics, and proteins. To the best of our knowledge,
this is the broadest numerical analysis to date of loading classical data into
a quantum computer. The required circuit depths are often several orders of
magnitude lower than the exponentially-scaling general loading algorithm would
require. Besides introducing a more efficient loading algorithm, this work
demonstrates that many classical datasets are loadable in depths that are much
shorter than previously expected, which has positive implications for speeding
up classical workloads on quantum computers.
( 3
min )
We propose INFAMOUS-NeRF, an implicit morphable face model that introduces
hypernetworks to NeRF to improve the representation power in the presence of
many training subjects. At the same time, INFAMOUS-NeRF resolves the classic
hypernetwork tradeoff of representation power and editability by learning
semantically-aligned latent spaces despite the subject-specific models, all
without requiring a large pretrained model. INFAMOUS-NeRF further introduces a
novel constraint to improve NeRF rendering along the face boundary. Our
constraint can leverage photometric surface rendering and multi-view
supervision to guide surface color prediction and improve rendering near the
surface. Finally, we introduce a novel, loss-guided adaptive sampling method
for more effective NeRF training by reducing the sampling redundancy. We show
quantitatively and qualitatively that our method achieves higher representation
power than prior face modeling methods in both controlled and in-the-wild
settings. Code and models will be released upon publication.
( 2
min )
The use of Mixed-Integer Linear Programming (MILP) models to represent neural
networks with Rectified Linear Unit (ReLU) activations has become increasingly
widespread in the last decade. This has enabled the use of MILP technology to
test-or stress-their behavior, to adversarially improve their training, and to
embed them in optimization models leveraging their predictive power. Many of
these MILP models rely on activation bounds. That is, bounds on the input
values of each neuron. In this work, we explore the tradeoff between the
tightness of these bounds and the computational effort of solving the resulting
MILP models. We provide guidelines for implementing these models based on the
impact of network structure, regularization, and rounding.
( 2
min )
This study presents a novel approach to addressing the challenge of missing
data in multivariate time series, with a particular focus on the complexities
of healthcare data. Our Conditional Self-Attention Imputation (CSAI) model,
grounded in a transformer-based framework, introduces a conditional hidden
state initialization tailored to the intricacies of medical time series data.
This methodology diverges from traditional imputation techniques by
specifically targeting the imbalance in missing data distribution, a crucial
aspect often overlooked in healthcare datasets. By integrating advanced
knowledge embedding and a non-uniform masking strategy, CSAI adeptly adjusts to
the distinct patterns of missing data in Electronic Health Records (EHRs).
( 2
min )
The recycling of waste electrical and electronic equipment is an essential
tool in allowing for a circular economy, presenting the potential for
significant environmental and economic gain. However, traditional material
separation techniques, based on physical and chemical processes, require
substantial investment and do not apply to all cases. In this work, we
investigate using an image classification neural network as a potential means
to control an automated material separation process in treating smartphone
waste, acting as a more efficient, less costly, and more widely applicable
alternative to existing tools. We produced a dataset with 1,127 images of
pyrolyzed smartphone components, which was then used to train and assess a
VGG-16 image classification model. The model achieved 83.33% accuracy, lending
credence to the viability of using such a neural network in material
separation.
( 2
min )
Curriculum learning and imitation learning have been leveraged extensively in
the robotics domain. However, minimal research has been done on leveraging
these ideas on control tasks over highly stochastic time-series data. Here, we
theoretically and empirically explore these approaches in a representative
control task over complex time-series data. We implement the fundamental ideas
of curriculum learning via data augmentation, while imitation learning is
implemented via policy distillation from an oracle. Our findings reveal that
curriculum learning should be considered a novel direction in improving
control-task performance over complex time-series. Our ample random-seed
out-sample empirics and ablation studies are highly encouraging for curriculum
learning for time-series control. These findings are especially encouraging as
we tune all overlapping hyperparameters on the baseline -- giving an advantage
to the baseline. On the other hand, we find that imitation learning should be
used with caution.
( 2
min )
Personalized Federated Learning (PFL) relies on collective data knowledge to
build customized models. However, non-IID data between clients poses
significant challenges, as collaborating with clients who have diverse data
distributions can harm local model performance, especially with limited
training data. To address this issue, we propose FedACS, a new PFL algorithm
with an Attention-based Client Selection mechanism. FedACS integrates an
attention mechanism to enhance collaboration among clients with similar data
distributions and mitigate the data scarcity issue. It prioritizes and
allocates resources based on data similarity. We further establish the
theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST
validate FedACS's superiority, showcasing its potential to advance personalized
federated learning. By tackling non-IID data challenges and data scarcity,
FedACS offers promising advances in the field of personalized federated
learning.
( 2
min )
Exploring whether Enriched Category Theory could provide the foundation of an
alternative approach to Machine Learning. This paper is the first to construct
and motivate a Machine Learning algorithm solely with Enriched Category Theory.
In order to supplement evidence that Category Theory can be used to motivate
robust and explainable algorithms, it is shown that a series of reasonable
assumptions about a dataset lead to the construction of the Nearest Neighbours
Algorithm. In particular, as an extension of the original dataset using
profunctors in the category of Lawvere metric spaces. This leads to a
definition of an Enriched Nearest Neighbours Algorithm, which consequently also
produces an enriched form of the Voronoi diagram. This paper is intended to be
accessible without any knowledge of Category Theory
( 2
min )
Particle-based Variational Inference (ParVI) methods approximate the target
distribution by iteratively evolving finite weighted particle systems. Recent
advances of ParVI methods reveal the benefits of accelerated position update
strategies and dynamic weight adjustment approaches. In this paper, we propose
the first ParVI framework that possesses both accelerated position update and
dynamical weight adjustment simultaneously, named the General Accelerated
Dynamic-Weight Particle-based Variational Inference (GAD-PVI) framework.
Generally, GAD-PVI simulates the semi-Hamiltonian gradient flow on a novel
Information-Fisher-Rao space, which yields an additional decrease on the local
functional dissipation. GAD-PVI is compatible with different dissimilarity
functionals and associated smoothing approaches under three information
metrics. Experiments on both synthetic and real-world data demonstrate the
faster convergence and reduced approximation error of GAD-PVI methods over the
state-of-the-art.
( 2
min )
Randomized smoothing is currently the state-of-the-art method that provides
certified robustness for deep neural networks. However, due to its excessively
conservative nature, this method of incomplete verification often cannot
achieve an adequate certified radius on real-world datasets. One way to obtain
a larger certified radius is to use an input-specific algorithm instead of
using a fixed Gaussian filter for all data points. Several methods based on
this idea have been proposed, but they either suffer from high computational
costs or gain marginal improvement in certified radius. In this work, we show
that by exploiting the quasiconvex problem structure, we can find the optimal
certified radii for most data points with slight computational overhead. This
observation leads to an efficient and effective input-specific randomized
smoothing algorithm. We conduct extensive experiments and empirical analysis on
CIFAR-10 and ImageNet. The results show that the proposed method significantly
enhances the certified radii with low computational overhead.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
Inverse reinforcement learning (IRL) usually assumes the model of the reward
function is pre-specified and estimates the parameter only. However, how to
determine a proper reward model is nontrivial. A simplistic model is less
likely to contain the real reward function, while a model with high complexity
leads to substantial computation cost and risks overfitting. This paper
addresses this trade-off in IRL model selection by introducing the structural
risk minimization (SRM) method from statistical learning. SRM selects an
optimal reward function class from a hypothesis set minimizing both estimation
error and model complexity. To formulate an SRM scheme for IRL, we estimate
policy gradient by demonstration serving as empirical risk and establish the
upper bound of Rademacher complexity of hypothesis classes as model penalty.
The learning guarantee is further presented. In particular, we provide explicit
SRM for the common linear weighted sum setting in IRL. Simulations demonstrate
the performance and efficiency of our scheme.
( 2
min )
We present convincing empirical results on the application of Randomized
Signature Methods for non-linear, non-parametric drift estimation for a
multi-variate financial market. Even though drift estimation is notoriously ill
defined due to small signal to noise ratio, one can still try to learn optimal
non-linear maps from data to future returns for the purposes of portfolio
optimization. Randomized Signatures, in contrast to classical signatures, allow
for high dimensional market dimension and provide features on the same scale.
We do not contribute to the theory of Randomized Signatures here, but rather
present our empirical findings on portfolio selection in real world settings
including real market data and transaction costs.
( 2
min )
Deep learning algorithms, especially Transformer-based models, have achieved
significant performance by capturing long-range dependencies and historical
information. However, the power of convolution has not been fully investigated.
Moreover, most existing works ignore the dynamic interaction among variables
and evolutionary noise in series. Addressing these issues, we propose a
Hierarchical Memorizing Network (HMNet). In particular, a hierarchical
convolution structure is introduced to extract the information from the series
at various scales. Besides, we propose a dynamic variable interaction module to
learn the varying correlation and an adaptive denoising module to search and
exploit similar patterns to alleviate noises. These modules can cooperate with
the hierarchical structure from the perspective of fine to coarse grain.
Experiments on five benchmarks demonstrate that HMNet significantly outperforms
the state-of-the-art models by 10.6% on MSE and 5.7% on MAE. Our code is
released at https://github.com/yzhHoward/HMNet.
( 2
min )
In this paper, we present XuanCe, a comprehensive and unified deep
reinforcement learning (DRL) library designed to be compatible with PyTorch,
TensorFlow, and MindSpore. XuanCe offers a wide range of functionalities,
including over 40 classical DRL and multi-agent DRL algorithms, with the
flexibility to easily incorporate new algorithms and environments. It is a
versatile DRL library that supports CPU, GPU, and Ascend, and can be executed
on various operating systems such as Ubuntu, Windows, MacOS, and EulerOS.
Extensive benchmarks conducted on popular environments including MuJoCo, Atari,
and StarCraftII multi-agent challenge demonstrate the library's impressive
performance. XuanCe is open-source and can be accessed at
https://github.com/agi-brain/xuance.git.
( 2
min )
Recent work found high mutual information between the learned representations
of large language models (LLMs) and the geospatial property of its input,
hinting an emergent internal model of space. However, whether this internal
space model has any causal effects on the LLMs' behaviors was not answered by
that work, led to criticism of these findings as mere statistical correlation.
Our study focused on uncovering the causality of the spatial representations in
LLMs. In particular, we discovered the potential spatial representations in
DeBERTa, GPT-Neo using representational similarity analysis and linear and
non-linear probing. Our casual intervention experiments showed that the spatial
representations influenced the model's performance on next word prediction and
a downstream task that relies on geospatial information. Our experiments
suggested that the LLMs learn and use an internal model of space in solving
geospatial related tasks.
( 2
min )
Leveraging knowledge from multiple tasks through introducing a small number
of task specific parameters into each transformer layer, also known as
adapters, receives much attention recently. However, adding an extra fusion
layer to implement knowledge composition not only increases the inference time
but also is non-scalable for some applications. To avoid these issues, we
propose a two-stage knowledge distillation algorithm called
AdapterDistillation. In the first stage, we extract task specific knowledge by
using local data to train a student adapter. In the second stage, we distill
the knowledge from the existing teacher adapters into the student adapter to
help its inference. Extensive experiments on frequently asked question
retrieval in task-oriented dialog systems validate the efficiency of
AdapterDistillation. We show that AdapterDistillation outperforms existing
algorithms in terms of accuracy, resource consumption and inference time.
( 2
min )
This paper analyses LightGCN in the context of graph recommendation
algorithms. Despite the initial design of Graph Convolutional Networks for
graph classification, the non-linear operations are not always essential.
LightGCN enables linear propagation of embeddings, enhancing performance. We
reproduce the original findings, assess LightGCN's robustness on diverse
datasets and metrics, and explore Graph Diffusion as an augmentation of signal
propagation in LightGCN.
( 2
min )
We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges
agents to use reasoning and decision-making skills to solve complex activities
that resemble everyday human challenges. The Mini-BEHAVIOR environment is a
fast, realistic Gridworld environment that offers the benefits of rapid
prototyping and ease of use while preserving a symbolic level of physical
realism and complexity found in complex embodied AI benchmarks. We introduce
key features such as procedural generation, to enable the creation of countless
task variations and support open-ended learning. Mini-BEHAVIOR provides
implementations of various household tasks from the original BEHAVIOR
benchmark, along with starter code for data collection and reinforcement
learning agent training. In essence, Mini-BEHAVIOR offers a fast, open-ended
benchmark for evaluating decision-making and planning solutions in embodied AI.
It serves as a user-friendly entry point for research and facilitates the
evaluation and development of solutions, simplifying their assessment and
development while advancing the field of embodied AI. Code is publicly
available at https://github.com/StanfordVL/mini_behavior.
( 2
min )
In this paper, we propose the use of self-supervised pretraining on a large
unlabelled data set to improve the performance of a personalized voice activity
detection (VAD) model in adverse conditions. We pretrain a long short-term
memory (LSTM)-encoder using the autoregressive predictive coding (APC)
framework and fine-tune it for personalized VAD. We also propose a denoising
variant of APC, with the goal of improving the robustness of personalized VAD.
The trained models are systematically evaluated on both clean speech and speech
contaminated by various types of noise at different SNR-levels and compared to
a purely supervised model. Our experiments show that self-supervised
pretraining not only improves performance in clean conditions, but also yields
models which are more robust to adverse conditions compared to purely
supervised learning.
( 2
min )
We present a comprehensive solution to learn and improve text-to-image models
from human preference feedback. To begin with, we build ImageReward -- the
first general-purpose text-to-image human preference reward model -- to
effectively encode human preferences. Its training is based on our systematic
annotation pipeline including rating and ranking, which collects 137k expert
comparisons to date. In human evaluation, ImageReward outperforms existing
scoring models and metrics, making it a promising automatic metric for
evaluating text-to-image synthesis. On top of it, we propose Reward Feedback
Learning (ReFL), a direct tuning algorithm to optimize diffusion models against
a scorer. Both automatic and human evaluation support ReFL's advantages over
compared methods. All code and datasets are provided at
\url{https://github.com/THUDM/ImageReward}.
( 2
min )
Self-supervised learning (SSL) in audio holds significant potential across
various domains, particularly in situations where abundant, unlabeled data is
readily available at no cost. This is particularly pertinent in bioacoustics,
where biologists routinely collect extensive sound datasets from the natural
environment. In this study, we demonstrate that SSL is capable of acquiring
meaningful representations of bird sounds from audio recordings without the
need for annotations. Our experiments showcase that these learned
representations exhibit the capacity to generalize to new bird species in
few-shot learning (FSL) scenarios. Additionally, we show that selecting windows
with high bird activation for self-supervised learning, using a pretrained
audio neural network, significantly enhances the quality of the learned
representations.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align LLM's. These reward models are additionally used at
inference-time to estimate LLM responses' adherence to those desired behaviors.
However, there is little work measuring how robust these reward models are to
distribution shifts. In this work, we evaluate how reward model performance -
measured via accuracy and calibration (i.e. alignment between accuracy and
confidence) - is affected by distribution shift. We show novel calibration
patterns and accuracy drops due to OOD prompts and responses, and that the
reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting to detect these distribution shifts
in prompts and responses.
( 2
min )
We explore the possibility of fully replacing a plasma physics kinetic
simulator with a graph neural network-based simulator. We focus on this class
of surrogate models given the similarity between their message-passing update
mechanism and the traditional physics solver update, and the possibility of
enforcing known physical priors into the graph construction and update. We show
that our model learns the kinetic plasma dynamics of the one-dimensional plasma
model, a predecessor of contemporary kinetic plasma simulation codes, and
recovers a wide range of well-known kinetic plasma processes, including plasma
thermalization, electrostatic fluctuations about thermal equilibrium, and the
drag on a fast sheet and Landau damping. We compare the performance against the
original plasma model in terms of run-time, conservation laws, and temporal
evolution of key physical quantities. The limitations of the model are
presented and possible directions for higher-dimensional surrogate models for
kinetic plasmas are discussed.
( 2
min )
Three-dimensional native states of natural proteins display recurring and
hierarchical patterns. Yet, traditional graph-based modeling of protein
structures is often limited to operate within a single fine-grained resolution,
and lacks hourglass neural architectures to learn those high-level building
blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant
coarse-graining model that efficiently operates on all-atom protein structures.
Our model departs from current approaches that employ graph modeling, instead
focusing on local convolutional coarsening to model sequence-motif interactions
with efficient time complexity in protein length. We measure the reconstruction
capabilities of Ophiuchus across different compression rates, and compare it to
existing models. We examine the learned latent space and demonstrate its
utility through conformational interpolation. Finally, we leverage denoising
diffusion probabilistic models (DDPM) in the latent space to efficiently sample
protein structures. Our experiments demonstrate Ophiuchus to be a scalable
basis for efficient protein modeling and generation.
( 2
min )
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system
that allows control over speaker identity using natural language descriptions.
To control speaker identity within the prompt-based TTS framework, we introduce
the concept of speaker prompt, which describes voice characteristics (e.g.,
gender-neutral, young, old, and muffled) designed to be approximately
independent of speaking style. Since there is no large-scale dataset containing
speaker prompts, we first construct a dataset based on the LibriTTS-R corpus
with manually annotated speaker prompts. We then employ a diffusion-based
acoustic model with mixture density networks to model diverse speaker factors
in the training data. Unlike previous studies that rely on style prompts
describing only a limited aspect of speaker individuality, such as pitch,
speaking speed, and energy, our method utilizes an additional speaker prompt to
effectively learn the mapping from natural language descriptions to the
acoustic features of diverse speakers. Our subjective evaluation results show
that the proposed method can better control speaker characteristics than the
methods without the speaker prompt. Audio samples are available at
https://reppy4620.github.io/demo.promptttspp/.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose to enhance the training of physics-informed neural networks
(PINNs). To this aim, we introduce nonlinear additive and multiplicative
preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear
preconditioners are constructed by utilizing the Schwarz domain-decomposition
framework, where the parameters of the network are decomposed in a layer-wise
manner. Through a series of numerical experiments, we demonstrate that both,
additive and multiplicative preconditioners significantly improve the
convergence of the standard L-BFGS optimizer, while providing more accurate
solutions of the underlying partial differential equations. Moreover, the
additive preconditioner is inherently parallel, thus giving rise to a novel
approach to model parallelism.
( 2
min )
The training process of ReLU neural networks often exhibits complicated
nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose
significant challenges for theoretical analysis. Therefore, most previous
theoretical works on the optimization dynamics of neural networks focus either
on local analysis (like the end of training) or approximate linear models (like
Neural Tangent Kernel). In this work, we conduct a complete theoretical
characterization of the training process of a two-layer ReLU network trained by
Gradient Flow on a linearly separable data. In this specific setting, our
analysis captures the whole optimization process starting from random
initialization to final convergence. Despite the relatively simple model and
data that we studied, we reveal four different phases from the whole training
process showing a general simplifying-to-complicating learning trend. Specific
nonlinear behaviors can also be precisely identified and captured
theoretically, such as initial condensation, saddle-to-plateau dynamics,
plateau escape, changes of activation patterns, learning with increasing
complexity, etc.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We present a short tutorial on to the use of the R gasper package. Gasper is
a package dedicated to signal processing on graphs. It also provides an
interface to the SuiteSparse Matrix Collection.
( 2
min )
This paper studies experimental designs for estimation and inference on
policies with spillover effects. Units are organized into a finite number of
large clusters and interact in unknown ways within each cluster. First, we
introduce a single-wave experiment that, by varying the randomization across
cluster pairs, estimates the marginal effect of a change in treatment
probabilities, taking spillover effects into account. Using the marginal
effect, we propose a test for policy optimality. Second, we design a
multiple-wave experiment to estimate welfare-maximizing treatment rules. We
provide strong theoretical guarantees and an implementation in a large-scale
field experiment.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
This paper introduces novel alternate training procedures for hard-parameter
sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces
challenges in managing conflicting loss gradients, often yielding sub-optimal
performance. The proposed alternate training method updates shared and
task-specific weights alternately, exploiting the multi-head architecture of
the model. This approach reduces computational costs, enhances training
regularization, and improves generalization. Convergence properties similar to
those of the classical stochastic gradient method are established. Empirical
experiments demonstrate delayed overfitting, improved prediction, and reduced
computational demands. In summary, our alternate training procedures offer a
promising advancement for the training of hard-parameter sharing MTNNs.
( 2
min )
We study the problem of learning linear temporal logic (LTL) formulas from
examples, as a first step towards expressing a property separating positive and
negative instances in a way that is comprehensible for humans. In this paper we
initiate the study of the computational complexity of the problem. Our main
results are hardness results: we show that the LTL learning problem is
NP-complete, both for the full logic and for almost all of its fragments. This
motivates the search for efficient heuristics, and highlights the complexity of
expressing separating properties in concise natural language.
( 2
min )
Generalized Labeled Multi-Bernoulli (GLMB) densities arise in a host of
multi-object system applications analogous to Gaussians in single-object
filtering. However, computing the GLMB filtering density requires solving
NP-hard problems. To alleviate this computational bottleneck, we develop a
linear complexity Gibbs sampling framework for GLMB density computation.
Specifically, we propose a tempered Gibbs sampler that exploits the structure
of the GLMB filtering density to achieve an $\mathcal{O}(T(P+M))$ complexity,
where $T$ is the number of iterations of the algorithm, $P$ and $M$ are the
number hypothesized objects and measurements. This innovation enables the GLMB
filter implementation to be reduced from an $\mathcal{O}(TP^{2}M)$ complexity
to $\mathcal{O}(T(P+M+\log T)+PM)$. Moreover, the proposed framework provides
the flexibility for trade-offs between tracking performance and computational
load. Convergence of the proposed Gibbs sampler is established, and numerical
studies are presented to validate the proposed GLMB filter implementation.
( 2
min )
We introduce a new empirical Bayes approach for large-scale multiple linear
regression. Our approach combines two key ideas: (i) the use of flexible
"adaptive shrinkage" priors, which approximate the nonparametric family of
scale mixture of normal distributions by a finite mixture of normal
distributions; and (ii) the use of variational approximations to efficiently
estimate prior hyperparameters and compute approximate posteriors. Combining
these two ideas results in fast and flexible methods, with computational speed
comparable to fast penalized regression methods such as the Lasso, and with
superior prediction accuracy across a wide range of scenarios. Furthermore, we
show that the posterior mean from our method can be interpreted as solving a
penalized regression problem, with the precise form of the penalty function
being learned from the data by directly solving an optimization problem (rather
than being tuned by cross-validation). Our methods are implemented in an R
package, mr.ash.alpha, available from
https://github.com/stephenslab/mr.ash.alpha
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
One of the most recent and fascinating breakthroughs in artificial
intelligence is ChatGPT, a chatbot which can simulate human conversation.
ChatGPT is an instance of GPT4, which is a language model based on generative
gredictive gransformers. So if one wants to study from a theoretical point of
view, how powerful such artificial intelligence can be, one approach is to
consider transformer networks and to study which problems one can solve with
these networks theoretically. Here it is not only important what kind of models
these network can approximate, or how they can generalize their knowledge
learned by choosing the best possible approximation to a concrete data set, but
also how well optimization of such transformer network based on concrete data
set works. In this article we consider all these three different aspects
simultaneously and show a theoretical upper bound on the missclassification
probability of a transformer network fitted to the observed data. For
simplicity we focus in this context on transformer encoder networks which can
be applied to define an estimate in the context of a classification problem
involving natural language.
( 2
min )
We propose a new method called the N-particle underdamped Langevin algorithm
for optimizing a special class of non-linear functionals defined over the space
of probability measures. Examples of problems with this formulation include
training neural networks in the mean-field regime, density estimation, and
kernel Stein discrepancy minimization. Our algorithm is based on a novel
space-time discretization of the mean-field underdamped Langevin dynamics, for
which we provide a new, fast mixing guarantee. In addition, we demonstrate that
our algorithm converges globally in total variation distance, bridging the
theoretical gap between the dynamics and its practical implementation.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
Whether abundant, endangered or extinct, animal species are the focus of countless AI-powered conservation projects. These initiatives — accelerated using NVIDIA GPUs, deep learning software and robotics technology — are alerting conservationists to poaching threats, powering more sustainable aquaculture and helping scientists monitor coral reef health. Take a safari through the NVIDIA Blog’s top animal Read article >
( 7
min )
Before ringing in the new year, GeForce NOW is taking a look back at a 2023 full of top-notch gaming. Explore GeForce NOW’s year in review, which brought more hit games, improved service features and the launch of the Ultimate membership tier. Plus, GFN Thursday is raising a toast to the GeForce NOW community by Read article >
( 7
min )
In this paper, we study asynchronous stochastic approximation algorithms
without communication delays. Our main contribution is a stability proof for
these algorithms that extends a method of Borkar and Meyn by accommodating more
general noise conditions. We also derive convergence results from this
stability result and discuss their application in important average-reward
reinforcement learning problems.
( 2
min )
Out-of-distribution (OOD) detection is an important topic for real-world
machine learning systems, but settings with limited in-distribution samples
have been underexplored. Such few-shot OOD settings are challenging, as models
have scarce opportunities to learn the data distribution before being tasked
with identifying OOD samples. Indeed, we demonstrate that recent
state-of-the-art OOD methods fail to outperform simple baselines in the
few-shot setting. We thus propose a hypernetwork framework called HyperMix,
using Mixup on the generated classifier parameters, as well as a natural
out-of-episode outlier exposure technique that does not require an additional
outlier dataset. We conduct experiments on CIFAR-FS and MiniImageNet,
significantly outperforming other OOD methods in the few-shot regime.
( 2
min )
Recent advancements in sensing and communication facilitate obtaining
high-frequency real-time data from various physical systems like power
networks, climate systems, biological networks, etc. However, since the data
are recorded by physical sensors, it is natural that the obtained data is
corrupted by measurement noise. In this paper, we present a novel algorithm for
online real-time learning of dynamical systems from noisy time-series data,
which employs the Robust Koopman operator framework to mitigate the effect of
measurement noise. The proposed algorithm has three main advantages: a) it
allows for online real-time monitoring of a dynamical system; b) it obtains a
linear representation of the underlying dynamical system, thus enabling the
user to use linear systems theory for analysis and control of the system; c) it
is computationally fast and less intensive than the popular Extended Dynamic
Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the
proposed algorithm by applying it to identify the Van der Pol oscillator, the
IEEE 68 bus system, and a ring network of Van der Pol oscillators.
( 2
min )
Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to
create AI-enabled experiences in their products. Along with the benefits of
ease of use and shortened time to production, this reliance on proprietary APIs
has downsides in terms of model control, performance reliability, up-time
predictability, and cost. At the same time, there has been a flurry of open
source small language models (SLMs) that have been made available for
commercial use. However, their readiness to replace existing capabilities
remains unclear, and a systematic approach to test these models is not readily
available. In this paper, we present a systematic evaluation methodology for,
and characterization of, modern open source SLMs and their trade-offs when
replacing a proprietary LLM APIs for a real-world product feature. We have
designed SLaM, an automated analysis tool that enables the quantitative and
qualitative testing of product features utilizing arbitrary SLMs. Using SLaM,
we examine both the quality and the performance characteristics of modern SLMs
relative to an existing customer-facing OpenAI-based implementation. We find
that across 9 SLMs and 29 variants, we observe competitive quality-of-results
for our use case, significant performance consistency improvement, and a cost
reduction of 5x-29x when compared to OpenAI GPT-4.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
This paper explores the image synthesis capabilities of GPT-4, a leading
multi-modal large language model. We establish a benchmark for evaluating the
fidelity of texture features in images generated by GPT-4, comprising manually
painted pictures and their AI-generated counterparts. The contributions of this
study are threefold: First, we provide an in-depth analysis of the fidelity of
image synthesis features based on GPT-4, marking the first such study on this
state-of-the-art model. Second, the quantitative and qualitative experiments
fully reveals the limitations of the GPT-4 model in image synthesis. Third, we
have compiled a unique benchmark of manual drawings and corresponding
GPT-4-generated images, introducing a new task to advance fidelity research in
AI-generated content (AIGC). The dataset is available at:
\url{https://github.com/rickwang28574/DeepArt}.
( 2
min )
This paper presents a Gaussian Process (GP) framework, a non-parametric
technique widely acknowledged for regression and classification tasks, to
address inverse problems in mean field games (MFGs). By leveraging GPs, we aim
to recover agents' strategic actions and the environment's configurations from
partial and noisy observations of the population of agents and the setup of the
environment. Our method is a probabilistic tool to infer the behaviors of
agents in MFGs from data in scenarios where the comprehensive dataset is either
inaccessible or contaminated by noises.
( 2
min )
We propose a simple multivariate normality test based on Kac-Bernstein's
characterization, which can be conducted by utilising existing statistical
independence tests for sums and differences of data samples. We also perform
its empirical investigation, which reveals that for high-dimensional data, the
proposed approach may be more efficient than the alternative ones. The
accompanying code repository is provided at \url{https://shorturl.at/rtuy5}.
( 2
min )
We explore the applications of random matrix theory (RMT) in the training of
deep neural networks (DNNs), focusing on layer pruning that is reducing the
number of DNN parameters (weights). Our numerical results show that this
pruning leads to a drastic reduction of parameters while not reducing the
accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually
increases the accuracy and decreases the variance for random initializations.
Our numerics indicate that this enhancement in accuracy is due to the
simplification of the loss landscape. We next provide rigorous mathematical
underpinning of these numerical results by proving the RMT-based Pruning
Theorem. Our results offer valuable insights into the practical application of
RMT for the creation of more efficient and accurate deep-learning models.
( 2
min )
This paper proposes an efficient optimizer called AdaPlus which integrates
Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus
combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does
not introduce any extra hyper-parameters. We perform extensive experimental
evaluations on three machine learning tasks to validate the effectiveness of
AdaPlus. The experiment results validate that AdaPlus (i) among all the
evaluated adaptive methods, performs most comparable with (even slightly better
than) SGD with momentum on image classification tasks and (ii) outperforms
other state-of-the-art optimizers on language modeling tasks and illustrates
pretty high stability when training GANs. The experiment code of AdaPlus will
be accessible at: https://github.com/guanleics/AdaPlus.
( 2
min )
The growth of network-connected devices has led to an exponential increase in
data generation, creating significant challenges for efficient data analysis.
This data is generated continuously, creating a dynamic flow known as a data
stream. The characteristics of a data stream may change dynamically, and this
change is known as concept drift. Consequently, a method for handling data
streams must efficiently reduce their volume while dynamically adapting to
these changing characteristics. This paper proposes a simple online vector
quantization method for concept drift. The proposed method identifies and
replaces units with low win probability through remove-birth updating, thus
achieving a rapid adaptation to concept drift. Furthermore, the results of this
study show that the proposed method can generate minimal dead units even in the
presence of concept drift. This study also suggests that some metrics
calculated from the proposed method will be helpful for drift detection.
( 2
min )
Multi-query attention (MQA), which only uses a single key-value head,
drastically speeds up decoder inference. However, MQA can lead to quality
degradation, and moreover it may not be desirable to train a separate model
just for faster inference. We (1) propose a recipe for uptraining existing
multi-head language model checkpoints into models with MQA using 5% of original
pre-training compute, and (2) introduce grouped-query attention (GQA), a
generalization of multi-query attention which uses an intermediate (more than
one, less than number of query heads) number of key-value heads. We show that
uptrained GQA achieves quality close to multi-head attention with comparable
speed to MQA.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
We study the consistency of surrogate risks for robust binary classification.
It is common to learn robust classifiers by adversarial training, which seeks
to minimize the expected $0$-$1$ loss when each example can be maliciously
corrupted within a small ball. We give a simple and complete characterization
of the set of surrogate loss functions that are \emph{consistent}, i.e., that
can replace the $0$-$1$ loss without affecting the minimizing sequences of the
original adversarial risk, for any data distribution. We also prove a
quantitative version of adversarial consistency for the $\rho$-margin loss. Our
results reveal that the class of adversarially consistent surrogates is
substantially smaller than in the standard setting, where many common
surrogates are known to be consistent.
( 2
min )
The current trend in developing machine learning models for reading
comprehension and logical reasoning tasks is focused on improving the models'
abilities to understand and utilize logical rules. This work focuses on
providing a novel loss function and accompanying model architecture that has
more interpretable components than some other models by representing a common
strategy employed by humans when given reading comprehension and logical
reasoning tasks. Our strategy involves emphasizing relative accuracy over
absolute accuracy and can theoretically produce the correct answer with
incomplete knowledge. We examine the effectiveness of this strategy to solve
reading comprehension and logical reasoning questions. The models were
evaluated on the ReClor dataset, a challenging reading comprehension and
logical reasoning benchmark. We propose the polytuplet loss function, which
forces prioritization of learning the relative correctness of answer choices
over learning the true accuracy of each choice. Our results indicate that
models employing polytuplet loss outperform existing baseline models, though
further research is required to quantify the benefits it may present.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
In nonstationary bandit learning problems, the decision-maker must
continually gather information and adapt their action selection as the latent
state of the environment evolves. In each time period, some latent optimal
action maximizes expected reward under the environment state. We view the
optimal action sequence as a stochastic process, and take an
information-theoretic approach to analyze attainable performance. We bound
limiting per-period regret in terms of the entropy rate of the optimal action
process. The bound applies to a wide array of problems studied in the
literature and reflects the problem's information structure through its
information-ratio.
( 2
min )
Federated Learning (FL) and Split Learning (SL) are two popular paradigms of
distributed machine learning. By offloading the computation-intensive portions
to the server, SL is promising for deep model training on resource-constrained
devices, yet still lacking of rigorous convergence analysis. In this paper, we
derive the convergence guarantees of Sequential SL (SSL, the vanilla case of SL
that conducts the model training in sequence) for strongly/general/non-convex
objectives on heterogeneous data. Notably, the derived guarantees suggest that
SSL is better than Federated Averaging (FedAvg, the most popular algorithm in
FL) on heterogeneous data. We validate the counterintuitive analysis result
empirically on extremely heterogeneous data.
( 2
min )
We study stochastic delayed feedback in general multi-agent sequential
decision making, which includes bandits, single-agent Markov decision processes
(MDPs), and Markov games (MGs). We propose a novel reduction-based framework,
which turns any multi-batched algorithm for sequential decision making with
instantaneous feedback into a sample-efficient algorithm that can handle
stochastic delays in sequential decision making. By plugging different
multi-batched algorithms into our framework, we provide several examples
demonstrating that our framework not only matches or improves existing results
for bandits, tabular MDPs, and tabular MGs, but also provides the first line of
studies on delays in sequential decision making with function approximation. In
summary, we provide a complete set of sharp results for multi-agent sequential
decision making with delayed feedback.
( 2
min )
Understanding the loss of information in spectral analytics is a crucial
first step towards finding root causes for failures and uncertainties using
spectral data in artificial intelligence models built from modern complex data
science applications. Here, we show from an elementary Shannon entropy model
analysis with quantum statistics of Gaussian distributed spectral data, that
the relative loss of information from dimensionality reduction due to the
projection of an initial five-dimensional dataset onto two-dimensional diagrams
is less than one percent in the parameter range of small data sets with sample
sizes on the order of few hundred data samples. From our analysis, we also
conclude that the density and expectation value of the entropy probability
distribution increases with the sample number and sample size using artificial
data models derived from random sampling Monte Carlo simulation methods.
( 2
min )
We present a convolutional framework which significantly reduces the
complexity and thus, the computational effort for distributed reinforcement
learning control of dynamical systems governed by partial differential
equations (PDEs). Exploiting translational invariances, the high-dimensional
distributed control problem can be transformed into a multi-agent control
problem with many identical, uncoupled agents. Furthermore, using the fact that
information is transported with finite velocity in many cases, the dimension of
the agents' environment can be drastically reduced using a convolution
operation over the state space of the PDE. In this setting, the complexity can
be flexibly adjusted via the kernel width or by using a stride greater than
one. Moreover, scaling from smaller to larger systems -- or the transfer
between different domains -- becomes a straightforward task requiring little
effort. We demonstrate the performance of the proposed framework using several
PDE examples with increasing complexity, where stabilization is achieved by
training a low-dimensional deep deterministic policy gradient agent using
minimal computing resources.
( 2
min )
Effective representation of molecules is a crucial factor affecting the
performance of artificial intelligence models. This study introduces a
flexible, fragment-based, multiscale molecular representation framework called
t-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES with
Shared Atom), TSDY (t-SMILES with Dummy Atom) and TSID (t-SMILES with ID). It
describes molecules using SMILES-type strings obtained by performing a
breadth-first search on a full binary tree formed from a fragmented molecular
graph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show the
feasibility to construct a multilingual molecular description system, where
various descriptions complement each other, enhancing the overall performance.
Additionally, it exhibits impressive performance on low-resource datasets,
whether the model is original, data augmented, or pre-training fine-tuned. It
significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline
models in goal-directed tasks. Furthermore, it surpasses start-of-the-art
fragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
( 2
min )
We consider (nonparametric) sparse additive models (SpAM) for classification.
The design of a SpAM classifier is based on minimizing the logistic loss with a
sparse group Lasso/Slope-type penalties on the coefficients of univariate
additive components' expansions in orthonormal series (e.g., Fourier or
wavelets). The resulting classifier is inherently adaptive to the unknown
sparsity and smoothness. We show that under certain sparse group restricted
eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously
across the entire range of analytic, Sobolev and Besov classes. The performance
of the proposed classifier is illustrated on a simulated and a real-data
examples.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
The accuracy of tinyML applications is often affected by various
environmental factors, such as noises, location/calibration of sensors, and
time-related changes. This article introduces a neural network based on-device
learning (ODL) approach to address this issue by retraining in deployed
environments. Our approach relies on semi-supervised sequential training of
multiple neural networks tailored for low-end edge devices. This article
introduces its algorithm and implementation on wireless sensor nodes consisting
of a Raspberry Pi Pico and low-power wireless module. Experiments using
vibration patterns of rotating machines demonstrate that retraining by ODL
improves anomaly detection accuracy compared with a prediction-only deep neural
network in a noisy environment. The results also show that the ODL approach can
save communication cost and energy consumption for battery-powered Internet of
Things devices.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
Molecular design based on generative models, such as variational autoencoders
(VAEs), has become increasingly popular in recent years due to its efficiency
for exploring high-dimensional molecular space to identify molecules with
desired properties. While the efficacy of the initial model strongly depends on
the training data, the sampling efficiency of the model for suggesting novel
molecules with enhanced properties can be further enhanced via latent space
optimization. In this paper, we propose a multi-objective latent space
optimization (LSO) method that can significantly enhance the performance of
generative molecular design (GMD). The proposed method adopts an iterative
weighted retraining approach, where the respective weights of the molecules in
the training data are determined by their Pareto efficiency. We demonstrate
that our multi-objective GMD LSO method can significantly improve the
performance of GMD for jointly optimizing multiple molecular properties.
( 2
min )
Intent Detection is one of the core tasks of dialog systems. Few-shot Intent
Detection is challenging due to limited number of annotated utterances for
novel classes. Generalized Few-shot intent detection is more realistic but
challenging setup which aims to discriminate the joint label space of both
novel intents which have few examples each and existing intents consisting of
enough labeled data. Large label spaces and fewer number of shots increase the
complexity of the task. In this work, we employ a simple and effective method
based on Natural Language Inference that leverages the semantics in the
class-label names to learn and predict the novel classes. Our method achieves
state-of-the-art results on 1-shot and 5-shot intent detection task with gains
ranging from 2-8\% points in F1 score on four benchmark datasets. Our method
also outperforms existing approaches on a more practical setting of generalized
few-shot intent detection with gains up to 20% F1 score. We show that the
suggested approach performs well across single and multi domain datasets with
the number of class labels from as few as 7 to as high as 150.
( 2
min )
The modeling and control of complex physical systems are essential in
real-world problems. We propose a novel framework that is generally applicable
to solving PDE-constrained optimal control problems by introducing surrogate
models for PDE solution operators with special regularizers. The procedure of
the proposed framework is divided into two phases: solution operator learning
for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once
the surrogate model is trained in Phase 1, the optimal control can be inferred
in Phase 2 without intensive computations. Our framework can be applied to both
data-driven and data-free cases. We demonstrate the successful application of
our method to various optimal control problems for different control variables
with diverse PDE constraints from the Poisson equation to Burgers' equation.
( 2
min )
Reinforcement learning has been used to train policies that outperform even
the best human players in various games. However, a large amount of data is
needed to achieve good performance, which in turn requires building large-scale
frameworks and simulators. In this paper, we study how large-scale
reinforcement learning can be applied to autonomous driving, analyze how the
resulting policies perform as the experiment size is scaled, and what the most
important factors contributing to policy performance are. To do this, we first
introduce a hardware-accelerated autonomous driving simulator, which allows us
to efficiently collect experience from billions of agent steps. This simulator
is paired with a large-scale, multi-GPU reinforcement learning framework. We
demonstrate that simultaneous scaling of dataset size, model size, and agent
steps trained provides increasingly strong driving policies in regard to
collision, traffic rule violations, and progress. In particular, our best
policy reduces the failure rate by 57% while improving progress by 23% compared
to the current state-of-the-art machine learning policies for autonomous
driving.
( 2
min )
Resistor networks have recently had a surge of interest as substrates for
energy-efficient self-learning machines. This work studies the computational
capabilities of these resistor networks. We show that electrical networks
composed of voltage sources, linear resistors, diodes and voltage-controlled
voltage sources (VCVS) can implement any continuous functions. To prove it, we
assume that the circuit elements are ideal and that the conductances of
variable resistors and the amplification factors of the VCVS's can take
arbitrary values -- arbitrarily small or arbitrarily large. The constructive
nature of our proof could also inform the design of such self-learning
electrical networks.
( 2
min )
The integration of different imaging modalities, such as structural,
diffusion tensor, and functional magnetic resonance imaging, with deep learning
models has yielded promising outcomes in discerning phenotypic characteristics
and enhancing disease diagnosis. The development of such a technique hinges on
the efficient fusion of heterogeneous multimodal features, which initially
reside within distinct representation spaces. Naively fusing the multimodal
features does not adequately capture the complementary information and could
even produce redundancy. In this work, we present a novel joint self-supervised
and supervised contrastive learning method to learn the robust latent feature
representation from multimodal MRI data, allowing the projection of
heterogeneous features into a shared common space, and thereby amalgamating
both complementary and analogous information across various modalities and
among similar subjects. We performed a comparative analysis between our
proposed method and alternative deep multimodal learning approaches. Through
extensive experiments on two independent datasets, the results demonstrated
that our method is significantly superior to several other deep multimodal
learning methods in predicting abnormal neurodevelopment. Our method has the
capability to facilitate computer-aided diagnosis within clinical practice,
harnessing the power of multimodal data.
( 2
min )
Model stores offer third-party ML models and datasets for easy project
integration, minimizing coding efforts. One might hope to find detailed
specifications of these models and datasets in the documentation, leveraging
documentation standards such as model and dataset cards. In this study, we use
statistical analysis and hybrid card sorting to assess the state of the
practice of documenting model cards and dataset cards in one of the largest
model stores in use today--Hugging Face (HF). Our findings show that only
21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation.
Furthermore, we observe inconsistency in ethics and transparency-related
documentation for ML models and datasets.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
Numerous regularization methods for deformable image registration aim at
enforcing smooth transformations, but are difficult to tune-in a priori and
lack a clear physical basis. Physically inspired strategies have emerged,
offering a sound theoretical basis, but still necessitating complex
discretization and resolution schemes. This study introduces a regularization
strategy that does not require discretization, making it compatible with
current registration frameworks, while retaining the benefits of physically
motivated regularization for medical image registration. The proposed method
performs favorably in both synthetic and real datasets, exhibiting an accuracy
comparable to current state-of-the-art methods.
( 2
min )
Tensorial neural networks (TNNs) combine the successes of multilinear algebra
with those of deep learning to enable extremely efficient reduced-order models
of high-dimensional problems. Here, I describe a deep neural network
architecture that fuses multiple TNNs into a larger network, intended to solve
a broader class of problems than a single TNN. I evaluate this architecture,
referred to as a "stacked tensorial neural network" (STNN), on a parametric PDE
with three independent variables and three parameters. The three parameters
correspond to one PDE coefficient and two quantities describing the domain
geometry. The STNN provides an accurate reduced-order description of the
solution manifold over a wide range of parameters. There is also evidence of
meaningful generalization to parameter values outside its training data.
Finally, while the STNN architecture is relatively simple and problem agnostic,
it can be regularized to incorporate problem-specific features like symmetries
and physical modeling assumptions.
( 2
min )
In this paper we define a population parameter, ``Generalized Variable
Importance Metric (GVIM)'', to measure importance of predictors for black box
machine learning methods, where the importance is not represented by
model-based parameter. GVIM is defined for each input variable, using the true
conditional expectation function, and it measures the variable's importance in
affecting a continuous or a binary response. We extend previously published
results to show that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for any kind of a predictor, which
gives it a causal interpretation and further justification as an alternative to
classical measures of significance that are only available in simple parametric
models. Extensive set of simulations using realistically complex relationships
between covariates and outcomes and number of regression techniques of varying
degree of complexity show the performance of our proposed estimator of the
GVIM.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
We introduce a pivot for exact selective inference with randomization. Not
only does our pivot lead to exact inference in Gaussian regression models, but
it is also available in closed form. We reduce the problem of exact selective
inference to a bivariate truncated Gaussian distribution. By doing so, we give
up some power that is achieved with approximate maximum likelihood estimation
in Panigrahi and Taylor (2022). Yet our pivot always produces narrower
confidence intervals than a closely related data splitting procedure. We
investigate the trade-off between power and exact selective inference on
simulated datasets and an HIV drug resistance dataset.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
We consider the problem of sufficient dimension reduction (SDR) for
multi-index models. The estimators of the central mean subspace in prior works
either have slow (non-parametric) convergence rates, or rely on stringent
distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$
being elliptical symmetric). In this paper, we show that a fast parametric
convergence rate of form $C_d \cdot n^{-1/2}$ is achievable via estimating the
\emph{expected smoothed gradient outer product}, for a general class of
distribution $P_{\mathbf{X}}$ admitting Gaussian or heavier distributions. When
the link function is a polynomial with a degree of at most $r$ and
$P_{\mathbf{X}}$ is the standard Gaussian, we show that the prefactor depends
on the ambient dimension $d$ as $C_d \propto d^r$.
( 2
min )
Unsupervised learning has become a staple in classical machine learning,
successfully identifying clustering patterns in data across a broad range of
domain applications. Surprisingly, despite its accuracy and elegant simplicity,
unsupervised learning has not been sufficiently exploited in the realm of
phylogenetic tree inference. The main reason for the delay in adoption of
unsupervised learning in phylogenetics is the lack of a meaningful, yet simple,
way of embedding phylogenetic trees into a vector space. Here, we propose the
simple yet powerful split-weight embedding which allows us to fit standard
clustering algorithms to the space of phylogenetic trees. We show that our
split-weight embedded clustering is able to recover meaningful evolutionary
relationships in simulated and real (Adansonia baobabs) data.
( 2
min )
Predicting audio quality in voice synthesis and conversion systems is a
critical yet challenging task, especially when traditional methods like Mean
Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses
the gap in efficient audio quality prediction, especially in low-resource
settings where extensive MOS data from large-scale listening tests may be
unavailable. We demonstrate that uncertainty measures derived from
out-of-the-box pretrained self-supervised learning (SSL) models, such as
wav2vec, correlate with MOS scores. These findings are based on data from the
2022 and 2023 VoiceMOS challenges. We explore the extent of this correlation
across different models and language contexts, revealing insights into how
inherent uncertainties in SSL models can serve as effective proxies for audio
quality assessment. In particular, we show that the contrastive wav2vec models
are the most performant in all settings.
( 2
min )
Deep Neural Networks (DNNs) are powerful tools for various computer vision
tasks, yet they often struggle with reliable uncertainty quantification - a
critical requirement for real-world applications. Bayesian Neural Networks
(BNN) are equipped for uncertainty estimation but cannot scale to large DNNs
that are highly unstable to train. To address this challenge, we introduce the
Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to
seamlessly transform DNNs into BNNs in a post-hoc manner with minimal
computational and training overheads. ABNN preserves the main predictive
properties of DNNs while enhancing their uncertainty quantification abilities
through simple BNN adaptation layers (attached to normalization layers) and a
few fine-tuning steps on pre-trained models. We conduct extensive experiments
across multiple datasets for image classification and semantic segmentation
tasks, and our results demonstrate that ABNN achieves state-of-the-art
performance without the computational budget typically associated with ensemble
methods.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
NVIDIA’s AI Podcast had its best year yet — with a record-breaking 1.2 million plays in 2023 and each biweekly episode now drawing more than 30,000 listens. Among tech’s top podcasts, the AI Podcast has racked up more than 200 episodes and nearly 5 million total plays since its debut in 2016. Listeners across the Read article >
( 5
min )
NVIDIA’s holiday card — enchanting viewers from the perspective of snuggled-up family members on a couch — warmly depicts a crackling fireplace and an NVIDIA robo-dog by the hearth, all framed by a string of sparkling lights.
( 8
min )
Transformers play a central role in the inner workings of large language
models. We develop a mathematical framework for analyzing Transformers based on
their interpretation as interacting particle systems, which reveals that
clusters emerge in long time. Our study explores the underlying theory and
offers new perspectives for mathematicians as well as computer scientists.
( 2
min )
In this paper we revisit the classical problem of classification, but impose
privacy constraints. Under such constraints, the raw data
$(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers
are functions of the randomised outcome of a suitable local differential
privacy mechanism. The statistician is free to choose the form of this privacy
mechanism, and here we add Laplace distributed noise to a discretisation of the
location of each feature vector $X_i$ and to its label $Y_i$. The
classification rule is the privatized version of the well-studied partitioning
classification rule. In addition to the standard Lipschitz and margin
conditions, a novel characteristic is introduced, by which the exact rate of
convergence of the classification error probability is calculated, both for
non-private and private data.
( 2
min )
In this work, a novel Stackelberg game theoretic framework is proposed for
trading energy bidirectionally between the demand-response (DR) aggregator and
the prosumers. This formulation allows for flexible energy arbitrage and
additional monetary rewards while ensuring that the prosumers' desired daily
energy demand is met. Then, a scalable (linear with the number of prosumers),
decentralized, privacy-preserving algorithm is proposed to find approximate
equilibria with online sampling and learning of the prosumers' cumulative best
response, which finds applications beyond this energy game. Moreover, cost
bounds are provided on the quality of the approximate equilibrium solution.
Finally, real data from the California day-ahead market and the UC Davis campus
building energy demands are utilized to demonstrate the efficacy of the
proposed framework and algorithm.
( 2
min )
This paper explores the application diffusion maps as graph shift operators
in understanding the underlying geometry of graph signals. The study evaluates
the improvements in graph learning when using diffusion map generated filters
to the Markov Variation minimization problem. The paper showcases the
effectiveness of this approach through examples involving synthetically
generated and real-world temperature sensor data. These examples also compare
the diffusion map graph signal model with other commonly used graph signal
operators. The results provide new approaches for the analysis and
understanding of complex, non-Euclidean data structures.
( 2
min )
We propose SutraNets, a novel method for neural probabilistic forecasting of
long-sequence time series. SutraNets use an autoregressive generative model to
factorize the likelihood of long sequences into products of conditional
probabilities. When generating long sequences, most autoregressive approaches
suffer from harmful error accumulation, as well as challenges in modeling
long-distance dependencies. SutraNets treat long, univariate prediction as
multivariate prediction over lower-frequency sub-series. Autoregression
proceeds across time and across sub-series in order to ensure coherent
multivariate (and, hence, high-frequency univariate) outputs. Since sub-series
can be generated using fewer steps, SutraNets effectively reduce error
accumulation and signal path distances. We find SutraNets to significantly
improve forecasting accuracy over competitive alternatives on six real-world
datasets, including when we vary the number of sub-series and scale up the
depth and width of the underlying sequence models.
( 2
min )
Automated machine learning (AutoML) systems propose an end-to-end solution to
a given machine learning problem, creating either fixed or flexible pipelines.
Fixed pipelines are task independent constructs: their general composition
remains the same, regardless of the data. In contrast, the structure of
flexible pipelines varies depending on the input, making them finely tailored
to individual tasks. However, flexible pipelines can be structurally
overcomplicated and have poor explainability. We propose the EVOSA approach
that compensates for the negative points of flexible pipelines by incorporating
a sensitivity analysis which increases the robustness and interpretability of
the flexible solutions. EVOSA quantitatively estimates positive and negative
impact of an edge or a node on a pipeline graph, and feeds this information to
the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was
validated in tabular, multimodal and computer vision tasks, suggesting
generalizability of the proposed approach across domains.
( 2
min )
Public release of the weights of pretrained foundation models, otherwise
known as downloadable access \citep{solaiman_gradient_2023}, enables
fine-tuning without the prohibitive expense of pretraining. Our work argues
that increasingly accessible fine-tuning of downloadable models may increase
hazards. First, we highlight research to improve the accessibility of
fine-tuning. We split our discussion into research that A) reduces the
computational cost of fine-tuning and B) improves the ability to share that
cost across more actors. Second, we argue that increasingly accessible
fine-tuning methods may increase hazard through facilitating malicious use and
making oversight of models with potentially dangerous capabilities more
difficult. Third, we discuss potential mitigatory measures, as well as benefits
of more accessible fine-tuning. Given substantial remaining uncertainty about
hazards, we conclude by emphasizing the urgent need for the development of
mitigations.
( 2
min )
The Adam optimizer is a popular choice in contemporary deep learning, due to
its strong empirical performance. However we observe that in privacy sensitive
scenarios, the traditional use of Differential Privacy (DP) with the Adam
optimizer leads to sub-optimal performance on several tasks. We find that this
performance degradation is due to a DP bias in Adam's second moment estimator,
introduced by the addition of independent noise in the gradient computation to
enforce DP guarantees. This DP bias leads to a different scaling for low
variance parameter updates, that is inconsistent with the behavior of
non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes
the bias in the second moment estimation and retrieves the expected behaviour
of Adam. Empirically, DP-AdamBC significantly improves the optimization
performance of DP-Adam by up to 3.5% in final accuracy in image, text, and
graph node classification tasks.
( 2
min )
Pulmonary Hypertension (PH) is a severe disease characterized by an elevated
pulmonary artery pressure. The gold standard for PH diagnosis is measurement of
mean Pulmonary Artery Pressure (mPAP) during an invasive Right Heart
Catheterization. In this paper, we investigate noninvasive approach to PH
detection utilizing Magnetic Resonance Imaging, Computer Models and Machine
Learning. We show using the ablation study, that physics-informed feature
engineering based on models of blood circulation increases the performance of
Gradient Boosting Decision Trees-based algorithms for classification of PH and
regression of values of mPAP. We compare results of regression (with
thresholding of estimated mPAP) and classification and demonstrate that metrics
achieved in both experiments are comparable. The predicted mPAP values are more
informative to the physicians than the probability of PH returned by
classification models. They provide the intuitive explanation of the outcome of
the machine learning model (clinicians are accustomed to the mPAP metric,
contrary to the PH probability).
( 2
min )
The goal of this work is to develop accurate Machine Learning (ML) models for
predicting the assembly axial neutron flux profiles in the SAFARI-1 research
reactor, trained by measurement data from historical cycles. The data-driven
nature of ML models makes them susceptible to uncertainties which are
introduced by sources such as noise in training data, incomplete coverage of
the domain, extrapolation and imperfect model architectures. To this end, we
also aim at quantifying the approximation uncertainties of the ML model
predictions. Previous work using Deep Neural Networks (DNNs) has been
successful for fuel assemblies in SAFARI-1, however, not as accurate for
control follower assemblies. The aim of this work is to improve the ML models
for the control assemblies by a combination of supervised and unsupervised ML
algorithms. The $k$-means and Affinity Propagation unsupervised ML algorithms
are employed to identify clusters in the set of the measured axial neutron flux
profiles. Then, regression-based supervised ML models using DNN (with
prediction uncertainties quantified with Monte Carlo dropout) and Gaussian
Process (GP) are trained for different clusters and the prediction uncertainty
is estimated. It was found that applying the proposed procedure improves the
prediction accuracy for the control assemblies and reduces the prediction
uncertainty. Flux shapes predicted by DNN and GP are very close, and the
overall accuracy became comparable to the fuel assemblies. The prediction
uncertainty is however smaller for GP models.
( 3
min )
The joint source coding and modulation (JSCM) framework was enabled by recent
developments in deep learning, which allows to automatically learn from data,
and in an end-to-end fashion, the best compression codes and modulation
schemes. In this paper, we show the existence of a strict tradeoff between
channel rate, distortion, perception, and classification accuracy in a JSCM
scenario. We then propose two image compression methods to navigate that
tradeoff: an inverse-domain generative adversarial network (ID-GAN), which
achieves extreme compression, and a simpler, heuristic method that reveals
insights about the performance of ID-GAN. Experiment results not only
corroborate the theoretical findings, but also demonstrate that the proposed
ID-GAN algorithm significantly improves system performance compared to
traditional separation-based methods and recent deep JSCM architectures.
( 2
min )
On-device training is essential for neural networks (NNs) to continuously
adapt to new online data, but can be time-consuming due to the device's limited
computing power. To speed up on-device training, existing schemes select
trainable NN portion offline or conduct unrecoverable selection at runtime, but
the evolution of trainable NN portion is constrained and cannot adapt to the
current need for training. Instead, runtime adaptation of on-device training
should be fully elastic, i.e., every NN substructure can be freely removed from
or added to the trainable NN portion at any time in training. In this paper, we
present ElasticTrainer, a new technique that enforces such elasticity to
achieve the required training speedup with the minimum NN accuracy loss.
Experiment results show that ElasticTrainer achieves up to 3.5x more training
speedup in wall-clock time and reduces energy consumption by 2x-3x more
compared to the existing schemes, without noticeable accuracy loss.
( 2
min )
To comply with new legal requirements and policies committed to privacy
protection, more and more companies start to deploy cross-silo Federated
Learning at global scale, where several clients/silos collaboratively train a
global model under the coordination of a central server. Instead of data
sharing and transmission, clients train models using their private local data
and exchange model updates. However, there is little understanding of the
carbon emission impact of cross silo Federated Learning due to the lack of
related works. In this study, we first analyze the sustainability aspect of
cross-silo Federated Learning, across the AI product life cycle instead of
focusing only on the model training, with the comparison to the centralized
method. A more holistic quantitative cost and CO2 emission estimation method
for real world cross-silo Federated Learning setting is proposed. Secondly, we
propose a novel data and application management system using cross silo
Federated Learning and analytics to make IT companies more sustainable and cost
effective.
( 2
min )
The growing number of wireless edge devices has magnified challenges
concerning energy, bandwidth, latency, and data heterogeneity. These challenges
have become bottlenecks for distributed learning. To address these issues, this
paper presents a novel approach that ensures energy efficiency for
distributionally robust federated learning (FL) with over air computation
(AirComp). In this context, to effectively balance robustness with energy
efficiency, we introduce a novel client selection method that integrates two
complementary insights: a deterministic one that is designed for energy
efficiency, and a probabilistic one designed for distributional robustness.
Simulation results underscore the efficacy of the proposed algorithm, revealing
its superior performance compared to baselines from both robustness and energy
efficiency perspectives, achieving more than 3-fold energy savings compared to
the considered baselines.
( 2
min )
Operator learning aims to discover properties of an underlying dynamical
system or partial differential equation (PDE) from data. Here, we present a
step-by-step guide to operator learning. We explain the types of problems and
PDEs amenable to operator learning, discuss various neural network
architectures, and explain how to employ numerical PDE solvers effectively. We
also give advice on how to create and manage training data and conduct
optimization. We offer intuition behind the various neural network
architectures employed in operator learning by motivating them from the
point-of-view of numerical linear algebra.
( 2
min )
In this study, we establish that deep neural networks employing ReLU and
ReLU$^2$ activation functions are capable of representing Lagrange finite
element functions of any order on simplicial meshes across arbitrary
dimensions. We introduce a novel global formulation of the basis functions for
Lagrange elements, grounded in a geometric decomposition of these elements and
leveraging two essential properties of high-dimensional simplicial meshes and
barycentric coordinate functions. This representation theory facilitates a
natural approximation result for such deep neural networks. Our findings
present the first demonstration of how deep neural networks can systematically
generate general continuous piecewise polynomial functions.
( 2
min )
The detection of out-of-distribution data points is a common task in particle
physics. It is used for monitoring complex particle detectors or for
identifying rare and unexpected events that may be indicative of new phenomena
or physics beyond the Standard Model. Recent advances in Machine Learning for
anomaly detection have encouraged the utilization of such techniques on
particle physics problems. This review article provides an overview of the
state-of-the-art techniques for anomaly detection in particle physics using
machine learning. We discuss the challenges associated with anomaly detection
in large and complex data sets, such as those produced by high-energy particle
colliders, and highlight some of the successful applications of anomaly
detection in particle physics experiments.
( 2
min )
Microring resonators (MRRs) are promising devices for time-delay photonic
reservoir computing, but the impact of the different physical effects taking
place in the MRRs on the reservoir computing performance is yet to be fully
understood. We numerically analyze the impact of linear losses as well as
thermo-optic and free-carrier effects relaxation times on the prediction error
of the time-series task NARMA-10. We demonstrate the existence of three
regions, defined by the input power and the frequency detuning between the
optical source and the microring resonance, that reveal the cavity transition
from linear to nonlinear regimes. One of these regions offers very low error in
time-series prediction under relatively low input power and number of nodes
while the other regions either lack nonlinearity or become unstable. This study
provides insight into the design of the MRR and the optimization of its
physical properties for improving the prediction performance of time-delay
reservoir computing.
( 2
min )
Forward invariance is a long-studied property in control theory that is used
to certify that a dynamical system stays within some pre-specified set of
states for all time, and also admits robustness guarantees (e.g., the
certificate holds under perturbations). We propose a general framework for
training and provably certifying robust forward invariance in Neural ODEs. We
apply this framework to provide certified safety in robust continuous control.
To our knowledge, this is the first instance of training Neural ODE policies
with such non-vacuous certified guarantees. In addition, we explore the
generality of our framework by using it to certify adversarial robustness for
image classification.
( 2
min )
Black-box variational inference is widely used in situations where there is
no proof that its stochastic optimization succeeds. We suggest this is due to a
theoretical gap in existing stochastic optimization proofs: namely the
challenge of gradient estimators with unusual noise bounds, and a composite
non-smooth objective. For dense Gaussian variational families, we observe that
existing gradient estimators based on reparameterization satisfy a quadratic
noise bound and give novel convergence guarantees for proximal and projected
stochastic gradient descent using this bound. This provides rigorous guarantees
that methods similar to those used in practice converge on realistic inference
problems.
( 2
min )
Through iterative, cross-disciplinary discussions, we define and propose
next-steps for Human-centered Generative AI (HGAI). We contribute a
comprehensive research agenda that lays out future directions of Generative AI
spanning three levels: aligning with human values; assimilating human intents;
and augmenting human abilities. By identifying these next-steps, we intend to
draw interdisciplinary research teams to pursue a coherent set of emergent
ideas in HGAI, focusing on their interested topics while maintaining a coherent
big picture of the future work landscape.
( 2
min )
Increased focus on the computational efficiency of NLP systems has motivated
the design of efficient model architectures and improvements to underlying
hardware accelerators. However, the resulting increases in computational
throughput and reductions in floating point operations have not directly
translated to improvements in wall-clock inference latency. We demonstrate that
these discrepancies can be largely attributed to bottlenecks introduced by deep
learning frameworks. We denote this phenomenon as the \textit{framework tax},
and observe that the disparity is growing as hardware speed increases over
time. In this work, we examine this phenomenon through a series of case studies
analyzing the effects of model design decisions, framework paradigms, and
hardware platforms on total model latency. Code is available at
https://github.com/JaredFern/Framework-Tax.
( 2
min )
The recent emergence of large language models (LLMs) shows the potential for
artificial general intelligence, revealing new opportunities in industry 4.0
and smart manufacturing. However, a notable gap exists in applying these LLMs
in industry, primarily due to their training on general knowledge rather than
domain-specific knowledge. Such specialized domain knowledge is vital for
effectively addressing the complex needs of industrial applications. To bridge
this gap, this paper proposes an Industrial Large Knowledge Model (ILKM)
framework emphasizing their potential to revolutionize the industry in smart
manufacturing. In addition, ILKMs and LLMs are compared from eight
perspectives. Finally, "6S Principle" is proposed as the guideline for the
development of ILKMs in smart manufacturing.
( 2
min )
Robustness is a fundamental property of machine learning classifiers required
to achieve safety and reliability. In the field of adversarial robustness of
image classifiers, robustness is commonly defined as the stability of a model
to all input changes within a p-norm distance. However, in the field of random
corruption robustness, variations observed in the real world are used, while
p-norm corruptions are rarely considered. This study investigates the use of
random p-norm corruptions to augment the training and test data of image
classifiers. We evaluate the model robustness against imperceptible random
p-norm corruptions and propose a novel robustness metric. We empirically
investigate whether robustness transfers across different p-norms and derive
conclusions on which p-norm corruptions a model should be trained and
evaluated. We find that training data augmentation with a combination of p-norm
corruptions significantly improves corruption robustness, even on top of
state-of-the-art data augmentation schemes.
( 2
min )
In recent years, the rapid development of deep learning has led to a wide
range of applications in the field of medical image classification. The
variants of neural network models with ever-increasing performance share some
commonalities: to try to mitigate overfitting, improve generalization, avoid
gradient vanishing and exploding, etc. AlexNet first utilizes the dropout
technique to mitigate overfitting and the ReLU activation function to avoid
gradient vanishing. Therefore, we focus our discussion on AlexNet, which has
contributed greatly to the development of CNNs in 2012. After reviewing over 40
papers, including journal papers and conference papers, we give a narrative on
the technical details, advantages, and application areas of AlexNet.
( 2
min )
We present a neural network for mitigating biased errors in pseudoranges to
improve localization performance with data collected from mobile phones. A
satellite-wise Multilayer Perceptron (MLP) is designed to regress the
pseudorange bias correction from six satellite, receiver, context-related
features derived from Android raw Global Navigation Satellite System (GNSS)
measurements. To train the MLP, we carefully calculate the target values of
pseudorange bias using location ground truth and smoothing techniques and
optimize a loss function involving the estimation residuals of smartphone clock
bias. The corrected pseudoranges are then used by a model-based localization
engine to compute locations. The Google Smartphone Decimeter Challenge (GSDC)
dataset, which contains Android smartphone data collected from both rural and
urban areas, is utilized for evaluation. Both fingerprinting and cross-trace
localization results demonstrate that our proposed method outperforms
model-based and state-of-the-art data-driven approaches.
( 2
min )
Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic
to the articulatory space. Signal-processing features like the MFCCs, have been
widely used for the AAI task. For subjects with dysarthric speech, AAI is
challenging because of an imprecise and indistinct pronunciation. In this work,
we perform AAI for dysarthric speech using representations from pre-trained
self-supervised learning (SSL) models. We demonstrate the impact of different
pre-trained features on this challenging AAI task, at low-resource conditions.
In addition, we also condition x-vectors to the extracted SSL features to train
a BLSTM network. In the seen case, we experiment with three AAI training
schemes (subject-specific, pooled, and fine-tuned). The results, consistent
across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves
a relative improvement of the Pearson Correlation Coefficient (CC) by ~1.81%
and ~4.56% for healthy controls and patients, respectively, over MFCCs. We
observe similar average trends for different SSL features in the unseen case.
Overall, SSL networks like wav2vec, APC, and DeCoAR, trained with feature
reconstruction or future timestep prediction tasks, perform well in predicting
dysarthric articulatory trajectories.
( 2
min )
The success of machine learning (ML) has been accompanied by increased
concerns about its trustworthiness. Several jurisdictions are preparing ML
regulatory frameworks. One such concern is ensuring that model training data
has desirable distributional properties for certain sensitive attributes. For
example, draft regulations indicate that model trainers are required to show
that training datasets have specific distributional properties, such as
reflecting diversity of the population.
We propose the notion of property attestation allowing a prover (e.g., model
trainer) to demonstrate relevant distributional properties of training data to
a verifier (e.g., a customer) without revealing the data. We present an
effective hybrid property attestation combining property inference with
cryptographic mechanisms.
( 2
min )
We study and introduce new gradient operators in the complex and bicomplex
settings, inspired from the well-known Least Mean Square (LMS) algorithm
invented in 1960 by Widrow and Hoff for Adaptive Linear Neuron (ADALINE).
These gradient operators will be used to formulate new learning rules for the
Bicomplex Least Mean Square (BLMS) algorithms and we will also formulate these
learning rules will for the case of multicomplex LMS algorithms (MLMS). This
approach extends both the classical real and complex LMS algorithms.
( 2
min )
In recent years generative adversarial networks (GANs) have been used to
supplement datasets within the field of marine bioacoustics. This is driven by
factors such as the cost to collect data, data sparsity and aid preprocessing.
One notable challenge with marine bioacoustic data is the low signal-to-noise
ratio (SNR) posing difficulty when applying deep learning techniques such as
GANs. This work investigates the effect SNR has on the audio-based GAN
performance and examines three different evaluation methodologies for GAN
performance, yielding interesting results on the effects of SNR on GANs,
specifically WaveGAN.
( 2
min )
Quantitative markets are characterized by swift dynamics and abundant
uncertainties, making the pursuit of profit-driven stock trading actions
inherently challenging. Within this context, reinforcement learning (RL), which
operates on a reward-centric mechanism for optimal control, has surfaced as a
potentially effective solution to the intricate financial decision-making
conundrums presented. This paper delves into the fusion of two established
financial trading strategies, namely the constant proportion portfolio
insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the
multi-agent deep deterministic policy gradient (MADDPG) framework. As a result,
we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and
TIPP-MADDPG, tailored for probing strategic trading within quantitative
markets. To validate these innovations, we implemented them on a diverse
selection of 100 real-market shares. Our empirical findings reveal that the
CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional
counterparts, affirming their efficacy in the realm of quantitative trading.
( 2
min )
In neural audio signal processing, pitch conditioning has been used to
enhance the performance of synthesizers. However, jointly training pitch
estimators and synthesizers is a challenge when using standard audio-to-audio
reconstruction loss, leading to reliance on external pitch trackers. To address
this issue, we propose using a spectral loss function inspired by optimal
transportation theory that minimizes the displacement of spectral energy. We
validate this approach through an unsupervised autoencoding task that fits a
harmonic template to harmonic signals. We jointly estimate the fundamental
frequency and amplitudes of harmonics using a lightweight encoder and
reconstruct the signals using a differentiable harmonic synthesizer. The
proposed approach offers a promising direction for improving unsupervised
parameter estimation in neural audio applications.
( 2
min )
Photo-trapping cameras are widely employed for wildlife monitoring. Those
cameras take photographs when motion is detected to capture images where
animals appear. A significant portion of these images are empty - no wildlife
appears in the image. Filtering out those images is not a trivial task since it
requires hours of manual work from biologists. Therefore, there is a notable
interest in automating this task. Automatic discarding of empty photo-trapping
images is still an open field in the area of Machine Learning. Existing
solutions often rely on state-of-the-art supervised convolutional neural
networks that require the annotation of the images in the training phase.
PARDINUS (Weakly suPervised discARDINg of photo-trapping empty images based on
aUtoencoderS) is constructed on the foundation of weakly supervised learning
and proves that this approach equals or even surpasses other fully supervised
methods that require further labeling work.
( 2
min )
Submodular maximization over a matroid constraint is a fundamental problem
with various applications in machine learning. Some of these applications
involve decision-making over datapoints with sensitive attributes such as
gender or race. In such settings, it is crucial to guarantee that the selected
solution is fairly distributed with respect to this attribute. Recently,
fairness has been investigated in submodular maximization under a cardinality
constraint in both the streaming and offline settings, however the more general
problem with matroid constraint has only been considered in the streaming
setting and only for monotone objectives. This work fills this gap. We propose
various algorithms and impossibility results offering different trade-offs
between quality, fairness, and generality.
( 2
min )
The increasing reliance of drivers on navigation applications has made
transportation networks more susceptible to data-manipulation attacks by
malicious actors. Adversaries may exploit vulnerabilities in the data
collection or processing of navigation services to inject false information,
and to thus interfere with the drivers' route selection. Such attacks can
significantly increase traffic congestions, resulting in substantial waste of
time and resources, and may even disrupt essential services that rely on road
networks. To assess the threat posed by such attacks, we introduce a
computational framework to find worst-case data-injection attacks against
transportation networks. First, we devise an adversarial model with a threat
actor who can manipulate drivers by increasing the travel times that they
perceive on certain roads. Then, we employ hierarchical multi-agent
reinforcement learning to find an approximate optimal adversarial strategy for
data manipulation. We demonstrate the applicability of our approach through
simulating attacks on the Sioux Falls, ND network topology.
( 2
min )
In this work, we propose REBEL, an algorithm for sample efficient reward
regularization based robotic reinforcement learning from human feedback
(RRLHF). Reinforcement learning (RL) performance for continuous control
robotics tasks is sensitive to the underlying reward function. In practice, the
reward function often ends up misaligned with human intent, values, social
norms, etc., leading to catastrophic failures in the real world. We leverage
human preferences to learn regularized reward functions and eventually align
the agents with the true intended behavior. We introduce a novel notion of
reward regularization to the existing RRLHF framework, which is termed as agent
preferences. So, we not only consider human feedback in terms of preferences,
we also propose to take into account the preference of the underlying RL agent
while learning the reward function. We show that this helps to improve the
over-optimization associated with the design of reward functions in RL. We
experimentally show that REBEL exhibits up to 70% improvement in sample
efficiency to achieve a similar level of episodic reward returns as compared to
the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.
( 2
min )
Traditional spectral energy distribution (SED) fitting techniques face
uncertainties due to assumptions in star formation histories and dust
attenuation curves. We propose an advanced machine learning-based approach that
enhances flexibility and uncertainty quantification in SED fitting. Unlike the
fixed NGBoost model used in mirkwood, our approach allows for any
sklearn-compatible model, including deterministic models. We incorporate
conformalized quantile regression to convert point predictions into error bars,
enhancing interpretability and reliability. Using CatBoost as the base
predictor, we compare results with and without conformal prediction,
demonstrating improved performance using metrics such as coverage and interval
width. Our method offers a more versatile and accurate tool for deriving galaxy
physical properties from observational data.
( 2
min )
In this work, we introduce an innovative autoregressive model leveraging
Generative Pretrained Transformer (GPT) architectures, tailored for fraud
detection in payment systems. Our approach innovatively confronts token
explosion and reconstructs behavioral sequences, providing a nuanced
understanding of transactional behavior through temporal and contextual
analysis. Utilizing unsupervised pretraining, our model excels in feature
representation without the need for labeled data. Additionally, we integrate a
differential convolutional approach to enhance anomaly detection, bolstering
the security and efficacy of one of the largest online payment merchants in
China. The scalability and adaptability of our model promise broad
applicability in various transactional contexts.
( 2
min )
The greatest demand for today's computing is machine learning. This paper
analyzes three machine learning algorithms: transformers, spatial convolution,
and FFT. The analysis is novel in three aspects. First, it measures the cost of
memory access on an abstract memory hierarchy, instead of traditional time or
space complexity. Second, the analysis is asymptotic and identifies the primary
sources of the memory cost. Finally, the result is symbolic, which can be used
to select algorithmic parameters such as the group size in grouped query
attention for any dimension size and number of heads and the batch size for
batched convolution for any image size and kernel size.
( 2
min )
Training large foundation models using self-supervised objectives on
unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a
standard procedure. Unfortunately, the efficacy of this approach is often
constrained by both limited fine-tuning compute and scarcity in labeled
downstream data. We introduce Multimodal Attention Merging (MAM), an attempt
that facilitates direct knowledge transfer from attention matrices of models
rooted in high resource modalities, text and images, to those in
resource-constrained domains, speech and audio, employing a zero-shot paradigm.
MAM reduces the relative Word Error Rate (WER) of an Automatic Speech
Recognition (ASR) model by up to 6.70%, and relative classification error of an
Audio Event Classification (AEC) model by 10.63%. In cases where some
data/compute is available, we present Learnable-MAM, a data-driven approach to
merging attention matrices, resulting in a further 2.90% relative reduction in
WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.
( 2
min )
Gate-defined quantum dots are a promising candidate system to realize
scalable, coupled qubit systems and serve as a fundamental building block for
quantum computers. However, present-day quantum dot devices suffer from
imperfections that must be accounted for, which hinders the characterization,
tuning, and operation process. Moreover, with an increasing number of quantum
dot qubits, the relevant parameter space grows sufficiently to make heuristic
control infeasible. Thus, it is imperative that reliable and scalable
autonomous tuning approaches are developed. In this report, we outline current
challenges in automating quantum dot device tuning and operation with a
particular focus on datasets, benchmarking, and standardization. We also
present ideas put forward by the quantum dot community on how to overcome them.
( 2
min )
Discovering mathematical models that characterize the observed behavior of
dynamical systems remains a major challenge, especially for systems in a
chaotic regime. The challenge is even greater when the physics underlying such
systems is not yet understood, and scientific inquiry must solely rely on
empirical data. Driven by the need to fill this gap, we develop a framework
that learns mathematical expressions modeling complex dynamical behaviors by
identifying differential equations from noisy and sparse observable data. We
train a small neural network to learn the dynamics of a system, its rate of
change in time, and missing model terms, which are used as input for a symbolic
regression algorithm to autonomously distill the explicit mathematical terms.
This, in turn, enables us to predict the future evolution of the dynamical
behavior. The performance of this framework is validated by recovering the
right-hand sides and unknown terms of certain complex, chaotic systems such as
the well-known Lorenz system, a six-dimensional hyperchaotic system, and the
non-autonomous Sprott chaotic system, and comparing them with their known
analytical expressions.
( 2
min )
The surge in high-throughput omics data has reshaped the landscape of
biological research, underlining the need for powerful, user-friendly data
analysis and interpretation tools. This paper presents GenoCraft, a web-based
comprehensive software solution designed to handle the entire pipeline of omics
data processing. GenoCraft offers a unified platform featuring advanced
bioinformatics tools, covering all aspects of omics data analysis. It
encompasses a range of functionalities, such as normalization, quality control,
differential analysis, network analysis, pathway analysis, and diverse
visualization techniques. This software makes state-of-the-art omics data
analysis more accessible to a wider range of users. With GenoCraft, researchers
and data scientists have access to an array of cutting-edge bioinformatics
tools under a user-friendly interface, making it a valuable resource for
managing and analyzing large-scale omics data. The API with an interactive web
interface is publicly available at https://genocraft.stanford. edu/. We also
release all the codes in https://github.com/futianfan/GenoCraft.
( 2
min )
In this paper we revisit the classical problem of classification, but impose
privacy constraints. Under such constraints, the raw data
$(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers
are functions of the randomised outcome of a suitable local differential
privacy mechanism. The statistician is free to choose the form of this privacy
mechanism, and here we add Laplace distributed noise to a discretisation of the
location of each feature vector $X_i$ and to its label $Y_i$. The
classification rule is the privatized version of the well-studied partitioning
classification rule. In addition to the standard Lipschitz and margin
conditions, a novel characteristic is introduced, by which the exact rate of
convergence of the classification error probability is calculated, both for
non-private and private data.
( 2
min )
Robustness is a fundamental property of machine learning classifiers required
to achieve safety and reliability. In the field of adversarial robustness of
image classifiers, robustness is commonly defined as the stability of a model
to all input changes within a p-norm distance. However, in the field of random
corruption robustness, variations observed in the real world are used, while
p-norm corruptions are rarely considered. This study investigates the use of
random p-norm corruptions to augment the training and test data of image
classifiers. We evaluate the model robustness against imperceptible random
p-norm corruptions and propose a novel robustness metric. We empirically
investigate whether robustness transfers across different p-norms and derive
conclusions on which p-norm corruptions a model should be trained and
evaluated. We find that training data augmentation with a combination of p-norm
corruptions significantly improves corruption robustness, even on top of
state-of-the-art data augmentation schemes.
( 2
min )
Black-box variational inference is widely used in situations where there is
no proof that its stochastic optimization succeeds. We suggest this is due to a
theoretical gap in existing stochastic optimization proofs: namely the
challenge of gradient estimators with unusual noise bounds, and a composite
non-smooth objective. For dense Gaussian variational families, we observe that
existing gradient estimators based on reparameterization satisfy a quadratic
noise bound and give novel convergence guarantees for proximal and projected
stochastic gradient descent using this bound. This provides rigorous guarantees
that methods similar to those used in practice converge on realistic inference
problems.
( 2
min )
The use of autonomous robots for assistance tasks in hospitals has the
potential to free up qualified staff and im-prove patient care. However, the
ubiquity of deformable and transparent objects in hospital settings poses
signif-icant challenges to vision-based perception systems. We present
EfficientPPS, a neural architecture for part-aware panoptic segmentation that
provides robots with semantically rich visual information for grasping and
ma-nipulation tasks. We also present an unsupervised data collection and
labelling method to reduce the need for human involvement in the training
process. EfficientPPS is evaluated on a dataset containing real-world hospital
objects and demonstrated to be robust and efficient in grasping transparent
transfusion bags with a collaborative robot arm.
( 2
min )
In this paper, we study the collaborative learning model, which concerns the
tradeoff between parallelism and communication overhead in multi-agent
multi-armed bandits. For regret minimization in multi-armed bandits, we present
the first set of tradeoffs between the number of rounds of communication among
the agents and the regret of the collaborative learning process.
( 2
min )
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D
radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable
training as well as fast 3D reconstruction at inference time. To overcome local
minima inherent to sparse and locally supported representations, we predict a
dense probability distribution over 3D and sample Gaussian means from that
probability distribution. We make this sampling operation differentiable via a
reparameterization trick, allowing us to back-propagate gradients through the
Gaussian splatting representation. We benchmark our method on wide-baseline
novel view synthesis on the real-world RealEstate10k and ACID datasets, where
we outperform state-of-the-art light field transformers and accelerate
rendering by 2.5 orders of magnitude while reconstructing an interpretable and
editable 3D radiance field.
( 2
min )
To better understand the output of deep neural networks (DNN), attribution
based methods have been an important approach for model interpretability, which
assign a score for each input dimension to indicate its importance towards the
model outcome. Notably, the attribution methods use the axioms of sensitivity
and implementation invariance to ensure the validity and reliability of
attribution results. Yet, the existing attribution methods present challenges
for effective interpretation and efficient computation. In this work, we
introduce MFABA, an attribution algorithm that adheres to axioms, as a novel
method for interpreting DNN. Additionally, we provide the theoretical proof and
in-depth analysis for MFABA algorithm, and conduct a large scale experiment.
The results demonstrate its superiority by achieving over 101.5142 times faster
speed than the state-of-the-art attribution algorithms. The effectiveness of
MFABA is thoroughly evaluated through the statistical analysis in comparison to
other methods, and the full implementation package is open-source at:
https://github.com/LMBTough/MFABA
( 2
min )
This study explores the application of anomaly detection (AD) methods in
imbalanced learning tasks, focusing on fraud detection using real online credit
card payment data. We assess the performance of several recent AD methods and
compare their effectiveness against standard supervised learning methods.
Offering evidence of distribution shift within our dataset, we analyze its
impact on the tested models' performances. Our findings reveal that LightGBM
exhibits significantly superior performance across all evaluated metrics but
suffers more from distribution shifts than AD methods. Furthermore, our
investigation reveals that LightGBM also captures the majority of frauds
detected by AD methods. This observation challenges the potential benefits of
ensemble methods to combine supervised, and AD approaches to enhance
performance. In summary, this research provides practical insights into the
utility of these techniques in real-world scenarios, showing LightGBM's
superiority in fraud detection while highlighting challenges related to
distribution shifts.
( 2
min )
Bayesian optimization (BO) is a sample-efficient method and has been widely
used for optimizing expensive black-box functions. Recently, there has been a
considerable interest in BO literature in optimizing functions that are
affected by context variable in the environment, which is uncontrollable by
decision makers. In this paper, we focus on the optimization of functions'
expectations over continuous context variable, subject to an unknown
distribution. To address this problem, we propose two algorithms that employ
kernel density estimation to learn the probability density function (PDF) of
continuous context variable online. The first algorithm is simpler, which
directly optimizes the expectation under the estimated PDF. Considering that
the estimated PDF may have high estimation error when the true distribution is
complicated, we further propose the second algorithm that optimizes the
distributionally robust objective. Theoretical results demonstrate that both
algorithms have sub-linear Bayesian cumulative regret on the expectation
objective. Furthermore, we conduct numerical experiments to empirically
demonstrate the effectiveness of our algorithms.
( 2
min )
Policy gradient methods enjoy strong practical performance in numerous tasks
in reinforcement learning. Their theoretical understanding in multiagent
settings, however, remains limited, especially beyond two-player competitive
and potential Markov games. In this paper, we develop a new framework to
characterize optimistic policy gradient methods in multi-player Markov games
with a single controller. Specifically, under the further assumption that the
game exhibits an equilibrium collapse, in that the marginals of coarse
correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to
stationary $\epsilon$-NE in $O(1/\epsilon^2)$ iterations, where $O(\cdot)$
suppresses polynomial factors in the natural parameters of the game. Such an
equilibrium collapse is well-known to manifest itself in two-player zero-sum
Markov games, but also occurs even in a class of multi-player Markov games with
separable interactions, as established by recent work. As a result, we bypass
known complexity barriers for computing stationary NE when either of our
assumptions fails. Our approach relies on a natural generalization of the
classical Minty property that we introduce, which we anticipate to have further
applications beyond Markov games.
( 2
min )
Tabular data analysis is crucial in various fields, and large language models
show promise in this area. However, current research mostly focuses on
rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like
forecasting and chart generation. To address this gap, we developed the
Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond
the SQL-compatible operations and require more in-depth analysis. We also
develop five innovative and effective annotation methods, harnessing the
capabilities of large language models to enhance data quality and quantity.
Additionally, we include unclear queries that resemble real-world user
questions to test how well models can understand and tackle such challenges.
Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five
state-of-the-art models using three different metrics and the results show that
our benchmark presents introduces considerable challenge in the field of
tabular data analysis, paving the way for more advanced research opportunities.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
This paper considers the epistemic justification for a simplicity preference
in inductive inference that may be obtained from the machine learning framework
of statistical learning theory. Uniting elements from both earlier arguments
suggesting and rejecting such a justification, the paper spells out a qualified
means-ends and model-relative justificatory argument, built on statistical
learning theory's central mathematical learning guarantee for the method of
empirical risk minimization.
( 2
min )
Generating counterfactual explanations is one of the most effective
approaches for uncovering the inner workings of black-box neural network models
and building user trust. While remarkable strides have been made in generative
modeling using diffusion models in domains like vision, their utility in
generating counterfactual explanations in structured modalities remains
unexplored. In this paper, we introduce Structured Counterfactual Diffuser or
SCD, the first plug-and-play framework leveraging diffusion for generating
counterfactual explanations in structured data. SCD learns the underlying data
distribution via a diffusion model which is then guided at test time to
generate counterfactuals for any arbitrary black-box model, input, and desired
prediction. Our experiments show that our counterfactuals not only exhibit high
plausibility compared to the existing state-of-the-art but also show
significantly better proximity and diversity.
( 2
min )
Recent works have shown that physics-inspired architectures allow the
training of deep graph neural networks (GNNs) without oversmoothing. The role
of these physics is unclear, however, with successful examples of both
reversible (e.g., Hamiltonian) and irreversible (e.g., diffusion) phenomena
producing comparable results despite diametrically opposed mechanisms, and
further complications arising due to empirical departures from mathematical
theory. This work presents a series of novel GNN architectures based upon
structure-preserving bracket-based dynamical systems, which are provably
guaranteed to either conserve energy or generate positive dissipation with
increasing depth. It is shown that the theoretically principled framework
employed here allows for inherently explainable constructions, which
contextualize departures from theory in current architectures and better
elucidate the roles of reversibility and irreversibility in network
performance.
( 2
min )
Accurate land use maps, describing the territory from an anthropic
utilisation point of view, are useful tools for land management and planning.
To produce them, the use of optical images alone remains limited. It is
therefore necessary to make use of several heterogeneous sources, each carrying
complementary or contradictory information due to their imperfections or their
different specifications. This study compares two different approaches i.e. a
pre-classification and a post-classification fusion approach for combining
several sources of spatial data in the context of land use classification. The
approaches are applied on authoritative land use data located in the Gers
department in the southwest of France. Pre-classification fusion, while not
explicitly modeling imperfections, has the best final results, reaching an
overall accuracy of 97% and a macro-mean F1 score of 88%.
( 2
min )
Existing algorithms for reinforcement learning from human feedback (RLHF) can
incentivize responses at odds with preferences because they are based on models
that assume independence of irrelevant alternatives (IIA). The perverse
incentives induced by IIA give rise to egregious behavior when innovating on
query formats or learning algorithms.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Through this paper, we introduce a novel driver cognitive load assessment
dataset, CL-Drive, which contains Electroencephalogram (EEG) signals along with
other physiological signals such as Electrocardiography (ECG) and Electrodermal
Activity (EDA) as well as eye tracking data. The data was collected from 21
subjects while driving in an immersive vehicle simulator, in various driving
conditions, to induce different levels of cognitive load in the subjects. The
tasks consisted of 9 complexity levels for 3 minutes each. Each driver reported
their subjective cognitive load every 10 seconds throughout the experiment. The
dataset contains the subjective cognitive load recorded as ground truth. In
this paper, we also provide benchmark classification results for different
machine learning and deep learning models for both binary and ternary label
distributions. We followed 2 evaluation criteria namely 10-fold and
leave-one-subject-out (LOSO). We have trained our models on both hand-crafted
features as well as on raw data.
( 3
min )
The effectiveness of digital treatments can be measured by requiring patients
to self-report their state through applications, however, it can be
overwhelming and causes disengagement. We conduct a study to explore the impact
of gamification on self-reporting. Our approach involves the creation of a
system to assess cognitive load (CL) through the analysis of
photoplethysmography (PPG) signals. The data from 11 participants is utilized
to train a machine learning model to detect CL. Subsequently, we create two
versions of surveys: a gamified and a traditional one. We estimate the CL
experienced by other participants (13) while completing surveys. We find that
CL detector performance can be enhanced via pre-training on stress detection
tasks. For 10 out of 13 participants, a personalized CL detector can achieve an
F1 score above 0.7. We find no difference between the gamified and non-gamified
surveys in terms of CL but participants prefer the gamified version.
( 3
min )
Multi-relational clustering is a challenging task due to the fact that
diverse semantic information conveyed in multi-layer graphs is difficult to
extract and fuse. Recent methods integrate topology structure and node
attribute information through graph filtering. However, they often use a
low-pass filter without fully considering the correlation among multiple
graphs. To overcome this drawback, we propose to learn a graph filter motivated
by the theoretical analysis of Barlow Twins. We find that input with a negative
semi-definite inner product provides a lower bound for Barlow Twins loss, which
prevents it from reaching a better solution. We thus learn a filter that yields
an upper bound for Barlow Twins. Afterward, we design a simple clustering
architecture and demonstrate its state-of-the-art performance on four benchmark
datasets.
( 2
min )
Stochastic optimal control of dynamical systems is a crucial challenge in
sequential decision-making. Recently, control-as-inference approaches have had
considerable success, providing a viable risk-sensitive framework to address
the exploration-exploitation dilemma. Nonetheless, a majority of these
techniques only invoke the inference-control duality to derive a modified risk
objective that is then addressed within a reinforcement learning framework.
This paper introduces a novel perspective by framing risk-sensitive stochastic
control as Markovian score climbing under samples drawn from a conditional
particle filter. Our approach, while purely inference-centric, provides
asymptotically unbiased estimates for gradient-based policy optimization with
optimal importance weighting and no explicit value function learning. To
validate our methodology, we apply it to the task of learning neural
non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks
of stochastic dynamical systems.
( 2
min )
Despite the great popularity of virtual screening of existing compound
libraries, the search for new potential drug candidates also takes advantage of
generative protocols, where new compound suggestions are enumerated using
various algorithms. To increase the activity potency of generative approaches,
they have recently been coupled with molecular docking, a leading methodology
of structure-based drug design. In this review, we summarize progress since
docking-based generative models emerged. We propose a new taxonomy for these
methods and discuss their importance for the field of computer-aided drug
design. In addition, we discuss the most promising directions for further
development of generative protocols coupled with docking.
( 2
min )
We study convergence rates of loss and uncertainty-based active learning
algorithms under various assumptions. First, we provide a set of conditions
under which a convergence rate guarantee holds, and use this for linear
classifiers and linearly separable datasets to show convergence rate guarantees
for loss-based sampling and different loss functions. Second, we provide a
framework that allows us to derive convergence rate bounds for loss-based
sampling by deploying known convergence rate bounds for stochastic gradient
descent algorithms. Third, and last, we propose an active learning algorithm
that combines sampling of points and stochastic Polyak's step size. We show a
condition on the sampling that ensures a convergence rate guarantee for this
algorithm for smooth convex loss functions. Our numerical results demonstrate
efficiency of our proposed algorithm.
( 2
min )
Industrial robots are applied in a widening range of industries, but robot
programming mostly remains a task limited to programming experts. We propose a
natural language-based assistant for programming of advanced, industrial
robotic applications and investigate strategies for domain-specific fine-tuning
of foundation models with limited data and compute.
( 2
min )
We propose to train neural networks (NNs) using a novel variant of the
``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method
is based on a parallelizable additive domain decomposition approach applied to
the neural network's parameters. Built upon the TR framework, the APTS method
ensures global convergence towards a minimizer. Moreover, it eliminates the
need for computationally expensive hyper-parameter tuning, as the TR algorithm
automatically determines the step size in each iteration. We demonstrate the
capabilities, strengths, and limitations of the proposed APTS training method
by performing a series of numerical experiments. The presented numerical study
includes a comparison with widely used training methods such as SGD, Adam,
LBFGS, and the standard TR method.
( 2
min )
Transformer-based Large Language Models (LLMs) have become a fixture in
modern machine learning. Correspondingly, significant resources are allocated
towards research that aims to further advance this technology, typically
resulting in models of increasing size that are trained on increasing amounts
of data. This work, however, demonstrates the surprising result that it is
often possible to significantly improve the performance of LLMs by selectively
removing higher-order components of their weight matrices. This simple
intervention, which we call LAyer-SElective Rank reduction (LASER), can be done
on a model after training has completed, and requires no additional parameters
or data. We show extensive experiments demonstrating the generality of this
finding across language models and datasets, and provide in-depth analyses
offering insights into both when LASER is effective and the mechanism by which
it operates.
( 2
min )
InvertibleNetworks.jl is a Julia package designed for the scalable
implementation of normalizing flows, a method for density estimation and
sampling in high-dimensional distributions. This package excels in memory
efficiency by leveraging the inherent invertibility of normalizing flows, which
significantly reduces memory requirements during backpropagation compared to
existing normalizing flow packages that rely on automatic differentiation
frameworks. InvertibleNetworks.jl has been adapted for diverse applications,
including seismic imaging, medical imaging, and CO2 monitoring, demonstrating
its effectiveness in learning high-dimensional distributions.
( 2
min )
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.
( 2
min )
Many inference scenarios rely on extracting relevant information from known
data in order to make future predictions. When the underlying stochastic
process satisfies certain assumptions, there is a direct mapping between its
exact classical and quantum simulators, with the latter asymptotically using
less memory. Here we focus on studying whether such quantum advantage persists
when those assumptions are not satisfied, and the model is doomed to have
imperfect accuracy. By studying the trade-off between accuracy and memory
requirements, we show that quantum models can reach the same accuracy with less
memory, or alternatively, better accuracy with the same memory. Finally, we
discuss the implications of this result for learning tasks.
( 2
min )
Physical based simulations can be very time and computationally demanding
tasks. One way of accelerating these processes is by making use of data-driven
surrogate models that learn from existing simulations. Ensembling methods are
particularly relevant in this domain as their smoothness properties coincide
with the smoothness of physical phenomena. The drawback is that they can remain
costly. This research project focused on studying Packed-Ensembles that
generalize Deep Ensembles but remain faster to train. Several models have been
trained and compared in terms of multiple important metrics. PE(8,4,1) has been
identified as the clear winner in this particular task, beating down its Deep
Ensemble conterpart while accelerating the training time by 25%.
( 2
min )
The generation of cold atom clouds is a complex process which involves the
optimization of noisy data in high dimensional parameter spaces. Optimization
can be challenging both in and especially outside of the lab due to lack of
time, expertise, or access for lengthy manual optimization. In recent years, it
was demonstrated that machine learning offers a solution since it can optimize
high dimensional problems quickly, without knowledge of the experiment itself.
In this paper we present results showing the benchmarking of nine different
optimization techniques and implementations, alongside their ability to
optimize a Rubidium (Rb) cold atom experiment. The investigations are performed
on a 3D $^{87}$Rb molasses with 10 and 18 adjustable parameters, respectively,
where the atom number obtained by absorption imaging was chosen as the test
problem. We further compare the best performing optimizers under different
effective noise conditions by reducing the Signal-to-Noise ratio of the images
via adapting the atomic vapor pressure in the 2D+ MOT and the detection laser
frequency stability.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
In this paper, we revisit the bilevel optimization problem, in which the
upper-level objective function is generally nonconvex and the lower-level
objective function is strongly convex. Although this type of problem has been
studied extensively, it still remains an open question how to achieve an
${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic
bilevel optimization without any second-order derivative computation. To fill
this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named
FdeHBO, which features a simple fully single-loop structure, a projection-aided
finite-difference Hessian/Jacobian-vector approximation, and momentum-based
updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$
iterations (each using ${O}(1)$ samples and only first-order gradient
information) to find an $\epsilon$-accurate stationary point. As far as we
know, this is the first Hessian/Jacobian-free method with an
${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex
stochastic bilevel optimization.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Large language model (LLM) training has surged in popularity over the last year with the release of several popular models such as Llama 2, Falcon, and Mistral. Customers are now pre-training and fine-tuning LLMs ranging from 1 billion to over 175 billion parameters to optimize model performance for applications across industries, from healthcare to finance […]
( 9
min )
Today, we are excited to announce that the Mixtral-8x7B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The Mixtral-8x7B LLM is a pre-trained sparse mixture of expert model, based on a 7-billion parameter backbone with eight experts per feed-forward […]
( 11
min )
This blog is co-written with Josh Reini, Shayak Sen and Anupam Datta from TruEra Amazon SageMaker JumpStart provides a variety of pretrained foundation models such as Llama-2 and Mistal 7B that can be quickly deployed to an endpoint. These foundation models perform well with generative tasks, from crafting text and summaries, answering questions, to producing […]
( 12
min )
Generative AI agents are capable of producing human-like responses and engaging in natural language conversations by orchestrating a chain of calls to foundation models (FMs) and other augmenting tools based on user input. Instead of only fulfilling predefined intents through a static decision tree, agents are autonomous within the context of their suite of available […]
( 15
min )
As I completed this blog series, the European Union (EU) announced its AI Regulation Law. The European Union’s AI Regulation Act seeks to ensure AI’s ethical and safe deployment in the EU. Coming on the heels of the White House’s “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” we… Read More »Creating a More Fair, Just, and Prosperous Brave New World with AI Summary
The post Creating a More Fair, Just, and Prosperous Brave New World with AI Summary appeared first on Data Science Central.
( 21
min )
Master's students Irene Terpstra ’23 and Rujul Gandhi ’22 use language to design new integrated circuits and make it understandable to robots.
( 9
min )
AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.
The post Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries appeared first on Microsoft Research.
( 17
min )
Quantization replaces floating point arithmetic with integer arithmetic in
deep neural network models, providing more efficient on-device inference with
less power and memory. In this work, we propose a framework for formally
verifying properties of quantized neural networks. Our baseline technique is
based on integer linear programming which guarantees both soundness and
completeness. We then show how efficiency can be improved by utilizing
gradient-based heuristic search methods and also bound-propagation techniques.
We evaluate our approach on perception networks quantized with PyTorch. Our
results show that we can verify quantized networks with better scalability and
efficiency than the previous state of the art.
( 2
min )
Deep generative models, such as diffusion models, GANs, and IMLE, have shown
impressive capability in tackling inverse problems. However, the validity of
model-generated solutions w.r.t. the forward problem and the reliability of
associated uncertainty estimates remain understudied. This study evaluates
recent diffusion-based, GAN-based, and IMLE-based methods on three inverse
problems, i.e., $16\times$ super-resolution, colourization, and image
decompression. We assess the validity of these models' outputs as solutions to
the inverse problems and conduct a thorough analysis of the reliability of the
models' estimates of uncertainty over the solution. Overall, we find that the
IMLE-based CHIMLE method outperforms other methods in terms of producing valid
solutions and reliable uncertainty estimates.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Fusing measurements from multiple, heterogeneous, partial sources, observing
a common object or process, poses challenges due to the increasing availability
of numbers and types of sensors. In this work we propose, implement and
validate an end-to-end computational pipeline in the form of a
multiple-auto-encoder neural network architecture for this task. The inputs to
the pipeline are several sets of partial observations, and the result is a
globally consistent latent space, harmonizing (rigidifying, fusing) all
measurements. The key enabler is the availability of multiple slightly
perturbed measurements of each instance:, local measurement, "bursts", that
allows us to estimate the local distortion induced by each instrument. We
demonstrate the approach in a sequence of examples, starting with simple
two-dimensional data sets and proceeding to a Wi-Fi localization problem and to
the solution of a "dynamical puzzle" arising in spatio-temporal observations of
the solutions of Partial Differential Equations.
( 2
min )
We introduce Efficient Title Reranker via Broadcasting Query Encoder, a novel
title reranking technique to achieve efficient title reranking 20x-40x faster
than vanilla passage reranker. However, one of the challenges with the training
of Efficient Title Reranker is the instability. Analyzing the issue, we found
some very difficult ground truths might act as noisy labels causing accuracy to
drop as well as some extreme values in model probability output causing nan. To
address these issues, we introduce the Sigmoid Trick, a novel technique that
reduces the gradient update of both cases resulting in better retrieval
efficacy. Experiments showed the effectiveness of ETR and sigmoid trick as we
achieved four state-of-the-art positions on the kilt knowledge benchmark.
( 2
min )
We present a novel approach to non-convex optimization with certificates,
which handles smooth functions on the hypercube or on the torus. Unlike
traditional methods that rely on algebraic properties, our algorithm exploits
the regularity of the target function intrinsic in the decay of its Fourier
spectrum. By defining a tractable family of models, we allow at the same time
to obtain precise certificates and to leverage the advanced and powerful
computational techniques developed to optimize neural networks. In this way the
scalability of our approach is naturally enhanced by parallel computing with
GPUs. Our approach, when applied to the case of polynomials of moderate
dimensions but with thousands of coefficients, outperforms the state-of-the-art
optimization methods with certificates, as the ones based on Lasserre's
hierarchy, addressing problems intractable for the competitors.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data. By conducting a broad range of experiments, we
demonstrate that our proposed approach provides a much closer approximation to
the actual data uncertainty than the standard method.
( 2
min )
Generative Adversarial Networks (GANs) have become a ubiquitous technology
for data generation, with their prowess in image generation being
well-established. However, their application in generating tabular data has
been less than ideal. Furthermore, attempting to incorporate differential
privacy technology into these frameworks has often resulted in a degradation of
data utility. To tackle these challenges, this paper introduces DP-SACTGAN, a
novel Conditional Generative Adversarial Network (CGAN) framework for
differentially private tabular data generation, aiming to surmount these
obstacles. Experimental findings demonstrate that DP-SACTGAN not only
accurately models the distribution of the original data but also effectively
satisfies the requirements of differential privacy.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
External control arms (ECA) can inform the early clinical development of
experimental drugs and provide efficacy evidence for regulatory approval in
non-randomized settings. However, the main challenge of implementing ECA lies
in accessing real-world data or historical clinical trials. Indeed, data
sharing is often not feasible due to privacy considerations related to data
leaving the original collection centers, along with pharmaceutical companies'
competitive motives. In this paper, we leverage a privacy-enhancing technology
called federated learning (FL) to remove some of the barriers to data sharing.
We introduce a federated learning inverse probability of treatment weighted
(IPTW) method for time-to-event outcomes called FedECA which eases the
implementation of ECA by limiting patients' data exposure. We show with
extensive experiments that FedECA outperforms its closest competitor,
matching-adjusted indirect comparison (MAIC), in terms of statistical power and
ability to balance the treatment and control groups. To encourage the use of
such methods, we publicly release our code which relies on Substra, an
open-source FL software with proven experience in privacy-sensitive contexts.
( 3
min )
Text segmentation, the task of dividing a document into sections, is often a
prerequisite for performing additional natural language processing tasks.
Existing text segmentation methods have typically been developed and tested
using clean, narrative-style text with segments containing distinct topics.
Here we consider a challenging text segmentation task: dividing newspaper
marriage announcement lists into units of one announcement each. In many cases
the information is not structured into sentences, and adjacent segments are not
topically distinct from each other. In addition, the text of the announcements,
which is derived from images of historical newspapers via optical character
recognition, contains many typographical errors. As a result, these
announcements are not amenable to segmentation with existing techniques. We
present a novel deep learning-based model for segmenting such text and show
that it significantly outperforms an existing state-of-the-art method on our
task.
( 2
min )
We propose a novel machine learning method for sampling from the
high-dimensional probability distributions of Lattice Field Theories, which is
based on a single neural ODE layer and incorporates the full symmetries of the
problem. We test our model on the $\phi^4$ theory, showing that it
systematically outperforms previously proposed flow-based methods in sampling
efficiency, and the improvement is especially pronounced for larger lattices.
Furthermore, we demonstrate that our model can learn a continuous family of
theories at once, and the results of learning can be transferred to larger
lattices. Such generalizations further accentuate the advantages of machine
learning methods.
( 2
min )
Nowadays neural-network-based image- and video-quality metrics show better
performance compared to traditional methods. However, they also became more
vulnerable to adversarial attacks that increase metrics' scores without
improving visual quality. The existing benchmarks of quality metrics compare
their performance in terms of correlation with subjective quality and
calculation time. However, the adversarial robustness of image-quality metrics
is also an area worth researching. In this paper, we analyse modern metrics'
robustness to different adversarial attacks. We adopted adversarial attacks
from computer vision tasks and compared attacks' efficiency against 15
no-reference image/video-quality metrics. Some metrics showed high resistance
to adversarial attacks which makes their usage in benchmarks safer than
vulnerable metrics. The benchmark accepts new metrics submissions for
researchers who want to make their metrics more robust to attacks or to find
such metrics for their needs. Try our benchmark using pip install
robustness-benchmark.
( 2
min )
We propose to learn non-convex regularizers with a prescribed upper bound on
their weak-convexity modulus. Such regularizers give rise to variational
denoisers that minimize a convex energy. They rely on few parameters (less than
15,000) and offer a signal-processing interpretation as they mimic handcrafted
sparsity-promoting regularizers. Through numerical experiments, we show that
such denoisers outperform convex-regularization methods as well as the popular
BM3D denoiser. Additionally, the learned regularizer can be deployed to solve
inverse problems with iterative schemes that provably converge. For both CT and
MRI reconstruction, the regularizer generalizes well and offers an excellent
tradeoff between performance, number of parameters, guarantees, and
interpretability when compared to other data-driven approaches.
( 2
min )
Recent studies show that deep reinforcement learning (DRL) agents tend to
overfit to the task on which they were trained and fail to adapt to minor
environment changes. To expedite learning when transferring to unseen tasks, we
propose a novel approach to representing the current task using reward machines
(RMs), state machine abstractions that induce subtasks based on the current
task's rewards and dynamics. Our method provides agents with symbolic
representations of optimal transitions from their current abstract state and
rewards them for achieving these transitions. These representations are shared
across tasks, allowing agents to exploit knowledge of previously encountered
symbols and transitions, thus enhancing transfer. Empirical results show that
our representations improve sample efficiency and few-shot transfer in a
variety of domains.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
The recent popularity of text-to-image diffusion models (DM) can largely be
attributed to the intuitive interface they provide to users. The intended
generation can be expressed in natural language, with the model producing
faithful interpretations of text prompts. However, expressing complex or
nuanced ideas in text alone can be difficult. To ease image generation, we
propose MultiFusion that allows one to express complex and nuanced concepts
with arbitrarily interleaved inputs of multiple modalities and languages.
MutliFusion leverages pre-trained models and aligns them for integration into a
cohesive system, thereby avoiding the need for extensive training from scratch.
Our experimental results demonstrate the efficient transfer of capabilities
from individual modules to the downstream model. Specifically, the fusion of
all independent components allows the image generation module to utilize
multilingual, interleaved multimodal inputs despite being trained solely on
monomodal data in a single language.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
Time Series Classification and Extrinsic Regression are important and
challenging machine learning tasks. Deep learning has revolutionized natural
language processing and computer vision and holds great promise in other fields
such as time series analysis where the relevant features must often be
abstracted from the raw data but are not known a priori. This paper surveys the
current state of the art in the fast-moving field of deep learning for time
series classification and extrinsic regression. We review different network
architectures and training methods used for these tasks and discuss the
challenges and opportunities when applying deep learning to time series data.
We also summarize two critical applications of time series classification and
extrinsic regression, human activity recognition and satellite earth
observation.
( 2
min )
To mitigate global warming, greenhouse gas sources need to be resolved at a
high spatial resolution and monitored in time to ensure the reduction and
ultimately elimination of the pollution source. However, the complexity of
computation in resolving high-resolution wind fields left the simulations
impractical to test different time lengths and model configurations. This study
presents a preliminary development of a physics-informed super-resolution (SR)
generative adversarial network (GAN) that super-resolves the three-dimensional
(3D) low-resolution wind fields by upscaling x9 times. We develop a pixel-wise
self-attention (PWA) module that learns 3D weather dynamics via a
self-attention computation followed by a 2D convolution. We also employ a loss
term that regularizes the self-attention map during pretraining, capturing the
vertical convection process from input wind data. The new PWA SR-GAN shows the
high-fidelity super-resolved 3D wind data, learns a wind structure at the
high-frequency domain, and reduces the computational cost of a high-resolution
wind simulation by x89.7 times.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
In this work, we study the problem of stability of Graph Convolutional Neural
Networks (GCNs) under random small perturbations in the underlying graph
topology, i.e. under a limited number of insertions or deletions of edges. We
derive a novel bound on the expected difference between the outputs of
unperturbed and perturbed GCNs. The proposed bound explicitly depends on the
magnitude of the perturbation of the eigenpairs of the Laplacian matrix, and
the perturbation explicitly depends on which edges are inserted or deleted.
Then, we provide a quantitative characterization of the effect of perturbing
specific edges on the stability of the network. We leverage tools from small
perturbation analysis to express the bounds in closed, albeit approximate,
form, in order to enhance interpretability of the results, without the need to
compute any perturbed shift operator. Finally, we numerically evaluate the
effectiveness of the proposed bound.
( 2
min )
We propose an energy-efficient equalizer for IM/DD systems based on spiking
neural networks. We optimize a neural spike encoding that boosts the
equalizer's performance while decreasing energy consumption.
( 2
min )
Deep reinforcement learning has advanced greatly and applied in many areas.
In this paper, we explore the vulnerability of deep reinforcement learning by
proposing a novel generative model for creating effective adversarial examples
to attack the agent. Our proposed model can achieve both targeted attacks and
untargeted attacks. Considering the specificity of deep reinforcement learning,
we propose the action consistency ratio as a measure of stealthiness, and a new
measurement index of effectiveness and stealthiness. Experiment results show
that our method can ensure the effectiveness and stealthiness of attack
compared with other algorithms. Moreover, our methods are considerably faster
and thus can achieve rapid and efficient verification of the vulnerability of
deep reinforcement learning.
( 2
min )
Motivated by the interpretability question in ML models as a crucial element
for the successful deployment of AI systems, this paper focuses on rule
extraction as a means for neural networks interpretability. Through a
systematic literature review, different approaches for extracting rules from
feedforward neural networks, an important block in deep learning models, are
identified and explored. The findings reveal a range of methods developed for
over two decades, mostly suitable for shallow neural networks, with recent
developments to meet deep learning models' challenges. Rules offer a
transparent and intuitive means of explaining neural networks, making this
study a comprehensive introduction for researchers interested in the field.
While the study specifically addresses feedforward networks with supervised
learning and crisp rules, future work can extend to other network types,
machine learning methods, and fuzzy rule extraction.
( 2
min )
Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning. An exponential family can
either be normalized subtractively by its cumulant function or equivalently
normalized divisively by its partition function. Both subtractive and divisive
normalizers are strictly convex and smooth functions inducing pairs of Bregman
and Jensen divergences. It is well-known that skewed Bhattacharryya distances
between probability densities of an exponential family amounts to skewed Jensen
divergences induced by the cumulant function between their corresponding
natural parameters, and in limit cases that the sided Kullback-Leibler
divergences amount to reverse-sided Bregman divergences. In this note, we first
show that the $\alpha$-divergences between unnormalized densities of an
exponential family amounts scaled $\alpha$-skewed Jensen divergences induced by
the partition function. We then show how comparative convexity with respect to
a pair of quasi-arithmetic means allows to deform convex functions and define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
Graph clustering is a fundamental and challenging task in the field of graph
mining where the objective is to group the nodes into clusters taking into
consideration the topology of the graph. It has several applications in diverse
domains spanning social network analysis, recommender systems, computer vision,
and bioinformatics. In this work, we propose a novel method, DGCluster, which
primarily optimizes the modularity objective using graph neural networks and
scales linearly with the graph size. Our method does not require the number of
clusters to be specified as a part of the input and can also leverage the
availability of auxiliary node level information. We extensively test DGCluster
on several real-world datasets of varying sizes, across multiple popular
cluster quality metrics. Our approach consistently outperforms the
state-of-the-art methods, demonstrating significant performance gains in almost
all settings.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
This work presents the PORTALS framework, which leverages surrogate modeling
and optimization techniques to enable the prediction of core plasma profiles
and performance with nonlinear gyrokinetic simulations at significantly reduced
cost, with no loss of accuracy. The efficiency of PORTALS is benchmarked
against standard methods, and its full potential is demonstrated on a unique,
simultaneous 5-channel (electron temperature, ion temperature, electron
density, impurity density and angular rotation) prediction of steady-state
profiles in a DIII-D ITER Similar Shape plasma with GPU-accelerated, nonlinear
CGYRO. This paper also provides general guidelines for accurate performance
predictions in burning plasmas and the impact of transport modeling in fusion
pilot plants studies.
( 2
min )
Fairness AI aims to detect and alleviate bias across the entire AI
development life cycle, encompassing data curation, modeling, evaluation, and
deployment-a pivotal aspect of ethical AI implementation. Addressing data bias,
particularly concerning sensitive attributes like gender and race, reweighting
samples proves efficient for fairness AI. This paper contributes a systematic
examination of reweighting samples for traditional machine learning (ML)
models, employing five models for binary classification on the Adult Income and
COMPUS datasets with various protected attributes. The study evaluates
prediction results using five fairness metrics, uncovering the nuanced and
model-specific nature of reweighting sample effectiveness in achieving fairness
in traditional ML models, as well as revealing the complexity of bias dynamics.
( 2
min )
Motivated by recent work on lifelong learning applications for language
models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused
on code changes. Our contribution addresses a notable research gap marked by
the absence of a long-term temporal dimension in existing code change datasets,
limiting their suitability in lifelong learning scenarios. In contrast, our
dataset aims to comprehensively capture code changes across the entire release
history of open-source software repositories. In this work, we introduce an
initial version of CodeLL, comprising 71 machine-learning-based projects mined
from Software Heritage. This dataset enables the extraction and in-depth
analysis of code changes spanning 2,483 releases at both the method and API
levels. CodeLL enables researchers studying the behaviour of LMs in lifelong
fine-tuning settings for learning code changes. Additionally, the dataset can
help studying data distribution shifts within software repositories and the
evolution of API usages over time.
( 2
min )
This paper explores the feasibility and performance of on-device large
language model (LLM) inference on various Apple iPhone models. Amidst the rapid
evolution of generative AI, on-device LLMs offer solutions to privacy,
security, and connectivity challenges inherent in cloud-based models.
Leveraging existing literature on running multi-billion parameter LLMs on
resource-limited devices, our study examines the thermal effects and
interaction speeds of a high-performing LLM across different smartphone
generations. We present real-world performance results, providing insights into
on-device inference capabilities.
( 2
min )
Neural construction models have shown promising performance for Vehicle
Routing Problems (VRPs) by adopting either the Autoregressive (AR) or
Non-Autoregressive (NAR) learning approach. While AR models produce
high-quality solutions, they generally have a high inference latency due to
their sequential generation nature. Conversely, NAR models generate solutions
in parallel with a low inference latency but generally exhibit inferior
performance. In this paper, we propose a generic Guided Non-Autoregressive
Knowledge Distillation (GNARKD) method to obtain high-performance NAR models
having a low inference latency. GNARKD removes the constraint of sequential
generation in AR models while preserving the learned pivotal components in the
network architecture to obtain the corresponding NAR models through knowledge
distillation. We evaluate GNARKD by applying it to three widely adopted AR
models to obtain NAR VRP solvers for both synthesized and real-world instances.
The experimental results demonstrate that GNARKD significantly reduces the
inference time (4-5 times faster) with acceptable performance drop (2-3\%). To
the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP
solvers from AR ones through knowledge distillation.
( 2
min )
We present a study on the integration of Large Language Models (LLMs) in
tabular data classification, emphasizing an efficient framework. Building upon
existing work done in TabLLM (arXiv:2210.10723), we introduce three novel
serialization techniques, including the standout LaTeX serialization method.
This method significantly boosts the performance of LLMs in processing
domain-specific datasets, Our method stands out for its memory efficiency and
ability to fully utilize complex data structures. Through extensive
experimentation, including various serialization approaches like feature
combination and importance, we demonstrate our work's superiority in accuracy
and efficiency over traditional models.
( 2
min )
Drivers can sustain serious injuries in traffic accidents. In this study,
traffic crashes on Florida's Interstate-95 from 2016 to 2021 were gathered, and
several classification methods were used to estimate the severity of driver
injuries. In the feature selection method, logistic regression was applied. To
compare model performances, various model assessment matrices such as accuracy,
recall, and area under curve (AUC) were developed. The Adaboost algorithm
outperformed the others in terms of recall and AUC. SHAP values were also
generated to explain the classification model's results. This analytical study
can be used to examine factors that contribute to the severity of driver
injuries in crashes.
( 2
min )
This paper presents a novel approach for analysing EEG data from drivers in a
simulated driving test. We focused on the Hurst exponent, Shannon entropy, and
fractal dimension as markers of the nonlinear dynamics of the brain. The
results show significant trends: Shannon Entropy and Fractal Dimension exhibit
variations during driving condition transitions, whereas the Hurst exponent
reflects memory retention portraying learning patterns. These findings suggest
that the tools of Non-linear Dynamical (NLD) Theory as indicators of cognitive
state and driving memory changes for assessing driver performance and advancing
the understanding of non-linear dynamics of human cognition in the context of
driving and beyond. Our study reveals the potential of NLD tools to elucidate
brain state and system variances, enabling their integration into current Deep
Learning and Machine Learning models. This integration can extend beyond
driving applications and be harnessed for cognitive learning, thereby improving
overall productivity and accuracy levels.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
Researchers are increasingly turning to machine learning (ML) algorithms to
investigate causal heterogeneity in randomized experiments. Despite their
promise, ML algorithms may fail to accurately ascertain heterogeneous treatment
effects under practical settings with many covariates and small sample size. In
addition, the quantification of estimation uncertainty remains a challenge. We
develop a general approach to statistical inference for heterogeneous treatment
effects discovered by a generic ML algorithm. We apply the Neyman's repeated
sampling framework to a common setting, in which researchers use an ML
algorithm to estimate the conditional average treatment effect and then divide
the sample into several groups based on the magnitude of the estimated effects.
We show how to estimate the average treatment effect within each of these
groups, and construct a valid confidence interval. In addition, we develop
nonparametric tests of treatment effect homogeneity across groups, and
rank-consistency of within-group average treatment effects. The validity of our
methodology does not rely on the properties of ML algorithms because it is
solely based on the randomization of treatment assignment and random sampling
of units. Finally, we generalize our methodology to the cross-fitting procedure
by accounting for the additional uncertainty induced by the random splitting of
data.
( 3
min )
Recent advances in practical quantum computing have led to a variety of
cloud-based quantum computing platforms that allow researchers to evaluate
their algorithms on noisy intermediate-scale quantum (NISQ) devices. A common
property of quantum computers is that they can exhibit instances of true
randomness as opposed to pseudo-randomness obtained from classical systems.
Investigating the effects of such true quantum randomness in the context of
machine learning is appealing, and recent results vaguely suggest that benefits
can indeed be achieved from the use of quantum random numbers. To shed some
more light on this topic, we empirically study the effects of hardware-biased
quantum random numbers on the initialization of artificial neural network
weights in numerical experiments. We find no statistically significant
difference in comparison with unbiased quantum random numbers as well as biased
and unbiased random numbers from a classical pseudo-random number generator.
The quantum random numbers for our experiments are obtained from real quantum
hardware.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
We establish explicit dynamics for neural networks whose training objective
has a regularising term that constrains the parameters to remain close to their
initial value. This keeps the network in a lazy training regime, where the
dynamics can be linearised around the initialisation. The standard neural
tangent kernel (NTK) governs the evolution during the training in the
infinite-width limit, although the regularisation yields an additional term
appears in the differential equation describing the dynamics. This setting
provides an appropriate framework to study the evolution of wide networks
trained to optimise generalisation objectives such as PAC-Bayes bounds, and
hence potentially contribute to a deeper theoretical understanding of such
networks.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
Great customer experience provides a competitive edge and helps create brand differentiation. As per the Forrester report, The State Of Customer Obsession, 2022, being customer-first can make a sizable impact on an organization’s balance sheet, as organizations embracing this methodology are surpassing their peers in revenue growth. Despite contact centers being under constant pressure to […]
( 10
min )
I asked DALL-E3 (via chatgpt) for "a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child who is learning to read."
"Please generate a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
AI made a splash this year — from Wall Street to the U.S. Congress — driven by a wave of developers aiming to make the world better. Here’s a look at AI in 2023 across agriculture, natural disasters, medicine and other areas worthy of a cocktail party conversation. This AI Is on Fire California has Read article >
( 7
min )
Time to gear up, hunters — Capcom’s Monster Hunter: World joins the GeForce NOW library, bringing members the ultimate hunting experience on any device. It’s all part of an adventurous week, with nearly a dozen new games joining the cloud gaming service. A Whole New World Join the Fifth Fleet on an epic adventure to Read article >
( 6
min )
We propose a Reinforcement-Learning-based system that would automatically
prescribe a hypothetical patient medications that may help the patient with
their mental-health-related speech disfluency, and adjust the medication and
the dosages in response to data from the patient. We demonstrate the components
of the system: a module that detects and evaluates speech disfluency on a large
dataset we built, and a Reinforcement Learning algorithm that automatically
finds good combinations of medications. To support the two modules, we collect
data on the effect of psychiatric medications for speech disfluency from the
literature, and build a plausible patient simulation system. We demonstrate
that the Reinforcement Learning system is, under some circumstances, able to
converge to a good medication regime. We collect and label a dataset of people
with possible speech disfluency and demonstrate our methods using that dataset.
Our work is a proof of concept: we show that there is promise in the idea of
using automatic data collection to address disfluency.
( 2
min )
We present XLand-MiniGrid, a suite of tools and grid-world environments for
meta-reinforcement learning research inspired by the diversity and depth of
XLand and the simplicity and minimalism of MiniGrid. XLand-Minigrid is written
in JAX, designed to be highly scalable, and can potentially run on GPU or TPU
accelerators, democratizing large-scale experimentation with limited resources.
To demonstrate the generality of our library, we have implemented some
well-known single-task environments as well as new meta-learning environments
capable of generating $10^8$ distinct tasks. We have empirically shown that the
proposed environments can scale up to $2^{13}$ parallel instances on the GPU,
reaching tens of millions of steps per second.
( 2
min )
Reinforcement learning (RL) often struggles to accomplish a sparse-reward
long-horizon task in a complex environment. Goal-conditioned reinforcement
learning (GCRL) has been employed to tackle this difficult problem via a
curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is
essential for the agent to ultimately find the pathway to the desired goal. How
to explore novel sub-goals efficiently is one of the most challenging issues in
GCRL. Several goal exploration methods have been proposed to address this issue
but still struggle to find the desired goals efficiently. In this paper, we
propose a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in sub-goal
selection based GCRL. To optimize this objective, we first explore and exploit
the frequently occurring goal-transition patterns mined in the environments
similar to the current task to compose skills via skill learning. Then, the
pretrained skills are applied in goal exploration. Evaluation on a variety of
spare-reward long-horizon benchmark tasks suggests that incorporating our
method into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance. The
source code is available at: https://github.com/GEAPS/GEAPS.
( 3
min )
We train a language model (LM) to robustly answer multistep questions by
generating and answering sub-questions. We propose Chain-of-Questions, a
framework that trains a model to generate sub-questions and sub-answers one at
a time by leveraging human annotated question decomposition meaning
representation (QDMR). The key technical challenge is that QDMR only contains
sub-questions but not answers to those sub-questions, so we treat sub-answers
as latent variables and optimize them using a novel dynamic mixture of Hard-EM
and MAPO. Chain-of-Questions greatly outperforms strong neuro-symbolic methods
by 9.0 F1 on DROP contrast set, and outperforms GPT-3.5 by 24.3 F1 on HOTPOTQA
adversarial set, thus demonstrating the effectiveness and robustness of our
framework.
( 2
min )
The count of people suffering from various levels of hearing loss reached
1.57 billion in 2019. This huge number tends to suffer on many personal and
professional levels and strictly needs to be included with the rest of society
healthily. This paper presents a proof of concept of an automatic sign language
recognition system based on data obtained using a wearable device of 3 flex
sensors. The system is designed to interpret a selected set of American Sign
Language (ASL) dynamic words by collecting data in sequences of the performed
signs and using machine learning methods. The built models achieved
high-quality performances, such as Random Forest with 99% accuracy, Support
Vector Machine (SVM) with 99%, and two K-Nearest Neighbor (KNN) models with
98%. This indicates many possible paths toward the development of a full-scale
system.
( 2
min )
Diffusion models have demonstrated strong potential for robotic trajectory
planning. However, generating coherent and long-horizon trajectories from
high-level instructions remains challenging, especially for complex tasks
requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end
hierarchical planning framework integrating interpretable skill learning with
conditional diffusion planning to address this problem. At the higher level,
the skill abstraction module learns discrete, human-understandable skill
representations from visual observations and language instructions. These
learned skill embeddings are then used to condition the diffusion model to
generate customized latent trajectories aligned with the skills. It allows for
generating diverse state trajectories that adhere to the learnable skills. By
integrating skill learning with conditional trajectory generation,
SkillDiffuser produces coherent behavior following abstract instructions across
diverse tasks. Experiments on multi-task robotic manipulation benchmarks like
Meta-World and LOReL demonstrate state-of-the-art performance and
human-interpretable skill representations from SkillDiffuser.
( 2
min )
Legged locomotion is arguably the most suited and versatile mode to deal with
natural or unstructured terrains. Intensive research into dynamic walking and
running controllers has recently yielded great advances, both in the optimal
control and reinforcement learning (RL) literature. Hopping is a challenging
dynamic task involving a flight phase and has the potential to increase the
traversability of legged robots. Model based control for hopping typically
relies on accurate detection of different jump phases, such as lift-off or
touch down, and using different controllers for each phase. In this paper, we
present a end-to-end RL based torque controller that learns to implicitly
detect the relevant jump phases, removing the need to provide manual heuristics
for state detection. We also extend a method for simulation to reality transfer
of the learned controller to contact rich dynamic tasks, resulting in
successful deployment on the robot after training without parameter tuning.
( 3
min )
Recently, large language models (LLMs) have made remarkable progress in
natural language processing. The most representative ability of LLMs is
in-context learning (ICL), which enables LLMs to learn patterns from in-context
exemplars without training. The performance of ICL greatly depends on the
exemplars used. However, how to choose exemplars remains unclear due to the
lack of understanding of how in-context learning works. In this paper, we
present a novel perspective on ICL by conceptualizing it as contextual
retrieval from a model of associative memory. We establish a theoretical
framework of ICL based on Hopfield Networks. Based on our framework, we look
into how in-context exemplars influence the performance of ICL and propose more
efficient active exemplar selection. Our study sheds new light on the mechanism
of ICL by connecting it to memory retrieval, with potential implications for
advancing the understanding of LLMs.
( 2
min )
Lipschitz-constrained neural networks have several advantages over
unconstrained ones and can be applied to a variety of problems, making them a
topic of attention in the deep learning community. Unfortunately, it has been
shown both theoretically and empirically that they perform poorly when equipped
with ReLU activation functions. By contrast, neural networks with learnable
1-Lipschitz linear splines are known to be more expressive. In this paper, we
show that such networks correspond to global optima of a constrained functional
optimization problem that consists of the training of a neural network composed
of 1-Lipschitz linear layers and 1-Lipschitz freeform activation functions with
second-order total-variation regularization. Further, we propose an efficient
method to train these neural networks. Our numerical experiments show that our
trained networks compare favorably with existing 1-Lipschitz neural
architectures.
( 2
min )
In this paper, we explore transferability in learning between different
attack classes in a network intrusion detection setup. We evaluate
transferability of attack classes by training a deep learning model with a
specific attack class and testing it on a separate attack class. We observe the
effects of real and synthetically generated data augmentation techniques on
transferability. We investigate the nature of observed transferability
relationships, which can be either symmetric or asymmetric. We also examine
explainability of the transferability relationships using the recursive feature
elimination algorithm. We study data preprocessing techniques to boost model
performance. The code for this work can be found at
https://github.com/ghosh64/transferability.
( 2
min )
In this work we develop a novel approach using deep neural networks to
reconstruct the conductivity distribution in elliptic problems from one
measurement of the solution over the whole domain. The approach is based on a
mixed reformulation of the governing equation and utilizes the standard
least-squares objective, with deep neural networks as ansatz functions to
approximate the conductivity and flux simultaneously. We provide a thorough
analysis of the deep neural network approximations of the conductivity for both
continuous and empirical losses, including rigorous error estimates that are
explicit in terms of the noise level, various penalty parameters and neural
network architectural parameters (depth, width and parameter bound). We also
provide multiple numerical experiments in two- and multi-dimensions to
illustrate distinct features of the approach, e.g., excellent stability with
respect to data noise and capability of solving high-dimensional problems.
( 2
min )
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
( 2
min )
In this study, we propose a new activation function, called Adaptive Smooth
Activation Unit (ASAU), tailored for optimized gradient propagation, thereby
enhancing the proficiency of convolutional networks in medical image analysis.
We apply this new activation function to two important and commonly used
general tasks in medical image analysis: automatic disease diagnosis and organ
segmentation in CT and MRI. Our rigorous evaluation on the RadImageNet
abdominal/pelvis (CT and MRI) dataset and Liver Tumor Segmentation Benchmark
(LiTS) 2017 demonstrates that our ASAU-integrated frameworks not only achieve a
substantial (4.80\%) improvement over ReLU in classification accuracy (disease
detection) on abdominal CT and MRI but also achieves 1\%-3\% improvement in
dice coefficient compared to widely used activations for `healthy liver tissue'
segmentation. These improvements offer new baselines for developing a
diagnostic tool, particularly for complex, challenging pathologies. The
superior performance and adaptability of ASAU highlight its potential for
integration into a wide range of image classification and segmentation tasks.
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Recent work by Marino et al. (2020) showed improved performance in sequential
density estimation by combining masked autoregressive flows with hierarchical
latent variable models. We draw a connection between such autoregressive
generative models and the task of lossy video compression. Specifically, we
view recent neural video compression methods (Lu et al., 2019; Yang et al.,
2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal
autoregressive transform, and propose avenues for enhancement based on this
insight. Comprehensive evaluations on large-scale video data show improved
rate-distortion performance over both state-of-the-art neural and conventional
video compression methods.
( 2
min )
Diffusion-based generative models represent the current state-of-the-art for
image generation. However, standard diffusion models are based on Euclidean
geometry and do not translate directly to manifold-valued data. In this work,
we develop extensions of both score-based generative models (SGMs) and
Denoising Diffusion Probabilistic Models (DDPMs) to the Lie group of 3D
rotations, SO(3). SO(3) is of particular interest in many disciplines such as
robotics, biochemistry and astronomy/cosmology science. Contrary to more
general Riemannian manifolds, SO(3) admits a tractable solution to heat
diffusion, and allows us to implement efficient training of diffusion models.
We apply both SO(3) DDPMs and SGMs to synthetic densities on SO(3) and
demonstrate state-of-the-art results. Additionally, we demonstrate the
practicality of our model on pose estimation tasks and in predicting correlated
galaxy orientations for astrophysics/cosmology.
( 2
min )
As large language models (LLMs) like ChatGPT have gained traction, an
increasing number of news websites have begun utilizing them to generate
articles. However, not only can these language models produce factually
inaccurate articles on reputable websites but disreputable news sites can
utilize LLMs to mass produce misinformation. To begin to understand this
phenomenon, we present one of the first large-scale studies of the prevalence
of synthetic articles within online news media. To do this, we train a
DeBERTa-based synthetic news detector and classify over 15.90 million articles
from 3,074 misinformation and mainstream news websites. We find that between
January 1, 2022, and May 1, 2023, the relative number of synthetic news
articles increased by 55.4% on mainstream websites while increasing by 457% on
misinformation sites. We find that this increase is largely driven by smaller
less popular websites. Analyzing the impact of the release of ChatGPT using an
interrupted-time-series, we show that while its release resulted in a marked
increase in synthetic articles on small sites as well as misinformation news
websites, there was not a corresponding increase on large mainstream news
websites.
( 3
min )
Powered by new advances in sensor development and artificial intelligence,
the decreasing cost of computation, and the pervasiveness of handheld
computation devices, biometric user authentication (and identification) is
rapidly becoming ubiquitous. Modern approaches to biometric authentication,
based on sophisticated machine learning techniques, cannot avoid storing either
trained-classifier details or explicit user biometric data, thus exposing
users' credentials to falsification. In this paper, we introduce a secure way
to handle user-specific information involved with the use of vector-space
classifiers or artificial neural networks for biometric authentication. Our
proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the
coupling of pre-existing classifiers with fuzzy extractors, through a
artificial-neural-network-based buffer called an expander, with minimal or no
performance degradation. The NFE thus offers all the performance advantages of
modern deep-learning-based classifiers, and all the security of standard fuzzy
extractors. We demonstrate the NFE retrofit to a classic artificial neural
network for a simple scenario of fingerprint-based user authentication.
( 3
min )
This paper presents the computational challenge on topological deep learning
that was hosted within the ICML 2023 Workshop on Topology and Geometry in
Machine Learning. The competition asked participants to provide open-source
implementations of topological neural networks from the literature by
contributing to the python packages TopoNetX (data processing) and TopoModelX
(deep learning). The challenge attracted twenty-eight qualifying submissions in
its two-month duration. This paper describes the design of the challenge and
summarizes its main findings.
( 2
min )
Objective: Early identification of ADHD is necessary to provide the
opportunity for timely treatment. However, screening the symptoms of ADHD on a
large scale is not easy. This study aimed to validate a video game (FishFinder)
for the screening of ADHD using objective measurement of the core symptoms of
this disorder. Method: The FishFinder measures attention and impulsivity
through in-game performance and evaluates the child's hyperactivity using
smartphone motion sensors. This game was tested on 26 children with ADHD and 26
healthy children aged 5 to 12 years. A Support Vector Machine was employed to
detect children with ADHD. results: This system showed 92.3% accuracy, 90%
sensitivity, and 93.7% specificity using a combination of in-game and movement
features. Conclusions: The FishFinder demonstrated a strong ability to identify
ADHD in children. So, this game can be used as an affordable, accessible, and
enjoyable method for the objective screening of ADHD.
( 2
min )
Accurately predicting line loss rates is vital for effective line loss
management in distribution networks, especially over short-term multi-horizons
ranging from one hour to one week. In this study, we propose
Attention-GCN-LSTM, a novel method that combines Graph Convolutional Networks
(GCN), Long Short-Term Memory (LSTM), and a three-level attention mechanism to
address this challenge. By capturing spatial and temporal dependencies, our
model enables accurate forecasting of line loss rates across multiple horizons.
Through comprehensive evaluation using real-world data from 10KV feeders, our
Attention-GCN-LSTM model consistently outperforms existing algorithms,
exhibiting superior performance in terms of prediction accuracy and
multi-horizon forecasting. This model holds significant promise for enhancing
line loss management in distribution networks.
( 2
min )
Text classification is an important topic in the field of natural language
processing. It has been preliminarily applied in information retrieval, digital
library, automatic abstracting, text filtering, word semantic discrimination
and many other fields. The aim of this research is to use a variety of
algorithms to test the ability to identify offensive posts and evaluate their
performance against a variety of assessment methods. The motivation for this
project is to reduce the harm of these languages to human censors by automating
the screening of offending posts. The field is a new one, and despite much
interest in the past two years, there has been no focus on the object of the
offence. Through the experiment of this project, it should inspire future
research on identification methods as well as identification content.
( 2
min )
Causal discovery with latent variables is a crucial but challenging task.
Despite the emergence of numerous methods aimed at addressing this challenge,
they are not fully identified to the structure that two observed variables are
influenced by one latent variable and there might be a directed edge in
between. Interestingly, we notice that this structure can be identified through
the utilization of higher-order cumulants. By leveraging the higher-order
cumulants of non-Gaussian data, we provide an analytical solution for
estimating the causal coefficients or their ratios. With the estimated (ratios
of) causal coefficients, we propose a novel approach to identify the existence
of a causal edge between two observed variables subject to latent variable
influence. In case when such a causal edge exits, we introduce an asymmetry
criterion to determine the causal direction. The experimental results
demonstrate the effectiveness of our proposed method.
( 2
min )
De novo peptide sequencing from mass spectrometry (MS) data is a critical
task in proteomics research. Traditional de novo algorithms have encountered a
bottleneck in accuracy due to the inherent complexity of proteomics data. While
deep learning-based methods have shown progress, they reduce the problem to a
translation task, potentially overlooking critical nuances between spectra and
peptides. In our research, we present ContraNovo, a pioneering algorithm that
leverages contrastive learning to extract the relationship between spectra and
peptides and incorporates the mass information into peptide decoding, aiming to
address these intricacies more efficiently. Through rigorous evaluations on two
benchmark datasets, ContraNovo consistently outshines contemporary
state-of-the-art solutions, underscoring its promising potential in enhancing
de novo peptide sequencing. The source code is available at
https://github.com/BEAM-Labs/ContraNovo.
( 2
min )
In this paper, we focus on the prediction phase of a random forest and study
the problem of representing a bag of decision trees using a smaller bag of
decision trees, where we only consider binary decision problems on the binary
domain and simple decision trees in which an internal node is limited to
querying the Boolean value of a single variable. As a main result, we show that
the majority function of $n$ variables can be represented by a bag of $T$ ($<
n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$
and $T$ must be odd (in order to avoid the tie break). We also show that a bag
of $n$ decision trees can be represented by a bag of $T$ decision trees each
with polynomial size if $n-T$ is a constant and a small classification error is
allowed. A related result on the $k$-out-of-$n$ functions is presented too.
( 2
min )
Drawing on theoretical insights, we advocate an error-based thresholding
(EBT) mechanism for learned ISTA (LISTA), which utilizes a function of the
layer-wise reconstruction error to suggest a specific threshold for each
observation in the shrinkage function of each layer. We show that the proposed
EBT mechanism well disentangles the learnable parameters in the shrinkage
functions from the reconstruction errors, endowing the obtained models with
improved adaptivity to possible data variations. With rigorous analyses, we
further show that the proposed EBT also leads to a faster convergence on the
basis of LISTA or its variants, in addition to its higher adaptivity. Extensive
experimental results confirm our theoretical analyses and verify the
effectiveness of our methods.
( 2
min )
Causal Structure Learning (CSL), amounting to extracting causal relations
among the variables in a dataset, is widely perceived as an important step
towards robust and transparent models. Constraint-based CSL leverages
conditional independence tests to perform causal discovery. We propose
Shapley-PC, a novel method to improve constraint-based CSL algorithms by using
Shapley values over the possible conditioning sets to decide which variables
are responsible for the observed conditional (in)dependences. We prove
soundness and asymptotic consistency and demonstrate that it can outperform
state-of-the-art constraint-based, search-based and functional causal
model-based methods, according to standard metrics in CSL.
( 2
min )
A large body of NLP research has documented the ways gender biases manifest
and amplify within large language models (LLMs), though this research has
predominantly operated within a gender binary-centric context. A growing body
of work has identified the harmful limitations of this gender-exclusive
framing; many LLMs cannot correctly and consistently refer to persons outside
the gender binary, especially if they use neopronouns. While data scarcity has
been identified as a possible culprit, the precise mechanisms through which it
influences LLM misgendering remain underexplored. Our work addresses this gap
by studying data scarcity's role in subword tokenization and, consequently, the
formation of LLM word representations. We uncover how the Byte-Pair Encoding
(BPE) tokenizer, a backbone for many popular LLMs, contributes to neopronoun
misgendering through out-of-vocabulary behavior. We introduce pronoun
tokenization parity (PTP), a novel approach to reduce LLM neopronoun
misgendering by preserving a token's functional structure. We evaluate PTP's
efficacy using pronoun consistency-based metrics and a novel syntax-based
metric. Through several controlled experiments, finetuning LLMs with PTP
improves neopronoun consistency from 14.5% to 58.4%, highlighting the
significant role tokenization plays in LLM pronoun consistency.
( 3
min )
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory
disease that significantly impacts the quality of life of affected individuals.
This paper presents COPDFlowNet, a novel deep-learning framework that leverages
a custom Generative Adversarial Network (GAN) to generate synthetic
Computational Fluid Dynamics (CFD) velocity flow field images specific to the
trachea of COPD patients. These synthetic images serve as a valuable resource
for data augmentation and model training. Additionally, COPDFlowNet
incorporates a custom Convolutional Neural Network (CNN) architecture to
predict the location of the obstruction site.
( 2
min )
In recent years, simulations of pedestrians using the multi-agent
reinforcement learning (MARL) have been studied. This study considered the
roads on a grid-world environment, and implemented pedestrians as MARL agents
using an echo-state network and the least squares policy iteration method.
Under this environment, the ability of these agents to learn to move forward by
avoiding other agents was investigated. Specifically, we considered two types
of tasks: the choice between a narrow direct route and a broad detour, and the
bidirectional pedestrian flow in a corridor. The simulations results indicated
that the learning was successful when the density of the agents was not that
high.
( 2
min )
Multi-document summarization is the process of automatically generating a
concise summary of multiple documents related to the same topic. This summary
can help users quickly understand the key information from a large collection
of documents. Multi-document summarization systems are more complex than
single-document summarization systems due to the need to identify and combine
information from multiple sources. In this paper, we have developed a machine
learning model that generates a concise summary of a topic from multiple news
documents. The model is designed to be unbiased by sampling its input equally
from all the different aspects of the topic, even if the majority of the news
sources lean one way.
( 2
min )
Graph Neural Networks are notorious for its memory consumption. A recent
Transformer based GNN called Graph Transformer are shown to obtain superior
performances when long range dependencies exist. However, combining graph data
and Transformer architecture led to a combinationally worse memory issue. We
propose a novel version of "edge regularization technique" that alleviates the
need for Positional Encoding and ultimately alleviate GT's out of memory issue.
We observe that it is not clear whether having an edge regularization on top of
positional encoding is helpful. However, it seems evident when no positional
encoding is applied, edge regularization technique indeed stably improves GT's
performance.
( 2
min )
In this work we introduce Labrador, a pre-trained Transformer model for
laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million
lab test results from electronic health records (EHRs) and evaluated on various
downstream outcome prediction tasks. Both models demonstrate mastery of the
pre-training task but neither consistently outperform XGBoost on downstream
supervised tasks. Our ablation studies reveal that transfer learning shows
limited effectiveness for BERT and achieves marginal success with Labrador. We
explore the reasons for the failure of transfer learning and suggest that the
data generating process underlying each patient cannot be characterized
sufficiently using labs alone, among other factors. We encourage future work to
focus on joint modeling of multiple EHR data categories and to include
tree-based baselines in their evaluations.
( 2
min )
Graph-based collaborative filtering methods have prevailing performance for
recommender systems since they can capture high-order information between users
and items, in which the graphs are constructed from the observed user-item
interactions that might miss links or contain spurious positive interactions in
industrial scenarios. The Bayesian Graph Neural Network framework approaches
this issue with generative models for the interaction graphs. The critical
problem is to devise a proper family of graph generative models tailored to
recommender systems. We propose an efficient generative model that jointly
considers the preferences of users, the concurrence of items and some important
graph structure information. Experiments on four popular benchmark datasets
demonstrate the effectiveness of our proposed graph generative methods for
recommender systems.
( 2
min )
Motivated by applications in queueing theory, we consider a stochastic
control problem whose state space is the $d$-dimensional positive orthant. The
controlled process $Z$ evolves as a reflected Brownian motion whose covariance
matrix is exogenously specified, as are its directions of reflection from the
orthant's boundary surfaces. A system manager chooses a drift vector
$\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at
time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem
formulation, the objective is to minimize expected discounted cost over an
infinite planning horizon, after which we treat the corresponding ergodic
control problem. Extending earlier work by Han et al. (Proceedings of the
National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a
simulation-based computational method that relies heavily on deep neural
network technology. For test problems studied thus far, our method is accurate
to within a fraction of one percent, and is computationally feasible in
dimensions up to at least $d=30$.
( 2
min )
We provide a systematic investigation of using physics-informed neural
networks to compute Lyapunov functions. We encode Lyapunov conditions as a
partial differential equation (PDE) and use this for training neural network
Lyapunov functions. We analyze the analytical properties of the solutions to
the Lyapunov and Zubov PDEs. In particular, we show that employing the Zubov
equation in training neural Lyapunov functions can lead to approximate regions
of attraction close to the true domain of attraction. We also examine
approximation errors and the convergence of neural approximations to the unique
solution of Zubov's equation. We then provide sufficient conditions for the
learned neural Lyapunov functions that can be readily verified by
satisfiability modulo theories (SMT) solvers, enabling formal verification of
both local stability analysis and region-of-attraction estimates in the large.
Through a number of nonlinear examples, ranging from low to high dimensions, we
demonstrate that the proposed framework can outperform traditional
sums-of-squares (SOS) Lyapunov functions obtained using semidefinite
programming (SDP).
( 2
min )
We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} +
b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in
\mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$.
On $n$ parallel processors, the computation of $n$ elements incurs
$\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form
are ubiquitous in science and engineering, making efficient parallelization
useful for a vast number of applications. We implement our expression in
software, test it on parallel hardware, and verify that it executes faster than
sequential computation by a factor of $\frac{n}{\log n}$.
( 2
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Time-series anomaly detection deals with the problem of detecting anomalous
timesteps by learning normality from the sequence of observations. However, the
concept of normality evolves over time, leading to a "new normal problem",
where the distribution of normality can be changed due to the distribution
shifts between training and test data. This paper highlights the prevalence of
the new normal problem in unsupervised time-series anomaly detection studies.
To tackle this issue, we propose a simple yet effective test-time adaptation
strategy based on trend estimation and a self-supervised approach to learning
new normalities during inference. Extensive experiments on real-world
benchmarks demonstrate that incorporating the proposed strategy into the
anomaly detector consistently improves the model's performance compared to the
baselines, leading to robustness to the distribution shifts.
( 2
min )
We provide an optimized implementation of the forward pass of
FlashAttention-2, a popular memory-aware scaled dot-product attention
algorithm, as a custom fused CUDA kernel targeting NVIDIA Hopper architecture
and written using the open-source CUTLASS library. In doing so, we explain the
challenges and techniques involved in fusing online-softmax with back-to-back
GEMM kernels, utilizing the Hopper-specific Tensor Memory Accelerator (TMA) and
Warpgroup Matrix-Multiply-Accumulate (WGMMA) instructions, defining and
transforming CUTLASS Layouts and Tensors, overlapping copy and GEMM operations,
and choosing optimal tile sizes for the Q, K and V attention matrices while
balancing the register pressure and shared memory utilization. In head-to-head
benchmarks on a single H100 PCIe GPU for some common choices of
hyperparameters, we observe 20-50% higher FLOPs/s over a version of
FlashAttention-2 optimized for last-generation NVIDIA Ampere architecture.
( 2
min )
This paper introduces a novel approach for topic modeling utilizing latent
codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely
encapsulating the rich information of the pre-trained embeddings such as the
pre-trained language model. From the novel interpretation of the latent
codebooks and embeddings as conceptual bag-of-words, we propose a new
generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates
the original documents related to the respective latent codebook. The TVQ-VAE
can visualize the topics with various generative distributions including the
traditional BoW distribution and the autoregressive image generation. Our
experimental results on document analysis and image generation demonstrate that
TVQ-VAE effectively captures the topic context which reveals the underlying
structures of the dataset and supports flexible forms of document generation.
Official implementation of the proposed TVQ-VAE is available at
https://github.com/clovaai/TVQ-VAE.
( 2
min )
The unprecedented performance of machine learning models in recent years,
particularly Deep Learning and transformer models, has resulted in their
application in various domains such as finance, healthcare, and education.
However, the models are error-prone and cannot be used autonomously, especially
in decision-making scenarios where, technically or ethically, the cost of error
is high. Moreover, because of the black-box nature of these models, it is
frequently difficult for the end user to comprehend the models' outcomes and
underlying processes to trust and use the model outcome to make a decision.
Explainable Artificial Intelligence (XAI) aids end-user understanding of the
model by utilizing approaches, including visualization techniques, to explain
and interpret the inner workings of the model and how it arrives at a result.
Although numerous research studies have been conducted recently focusing on the
performance of models and the XAI approaches, less work has been done on the
impact of explanations on human-AI team performance. This paper surveyed the
recent empirical studies on XAI's impact on human-AI decision-making,
identified the challenges, and proposed future research directions.
( 2
min )
In this article we offer a comprehensive analysis of the Urysohn's classifier
in a binary classification context. It utilizes Urysohn's Lemma of Topology to
construct separating functions, providing rigorous and adaptable solutions.
Numerical experiments demonstrated exceptional performance, with scores ranging
from 95% to 100%. Notably, the Urysohn's classifier outperformed CatBoost and
KNN in various scenarios. Despite sensitivity to the p-metric parameter, it
proved robust and adaptable. The Urysohn's classifier's mathematical rigor and
adaptability make it promising for binary classification, with applications in
medical diagnosis, fraud detection and cyber security. Future research includes
parameter optimization and combining the Urysohn's classifier with other
techniques. It offers an elegant and principled approach to classification,
ensuring integrity and valuable data insights.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Recent advances in autonomous robotic technologies have highlighted the
growing need for precise environmental analysis. LiDAR semantic segmentation
has gained attention to accomplish fine-grained scene understanding by acting
directly on raw content provided by sensors. Recent solutions showed how
different learning techniques can be used to improve the performance of the
model, without any architectural or dataset change. Following this trend, we
present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK)
derived from a standard model. First, classes are clustered into macro groups
according to mutual prediction errors; then, the learning process is
regularized by: (1) aligning class-conditional prototypical feature
representation for both fine and coarse classes, (2) weighting instances with a
per-class fairness index. Our LEAK approach is very general and can be
seamlessly applied on top of any segmentation architecture; indeed,
experimental results showed that it enables state-of-the-art performances on
different architectures, datasets and tasks, while ensuring more balanced
class-wise results and faster convergence.
( 2
min )
Offline reinforcement learning (RL) aims to learn an effective policy from a
pre-collected dataset. Most existing works are to develop sophisticated
learning algorithms, with less emphasis on improving the data collection
process. Moreover, it is even challenging to extend the single-task setting and
collect a task-agnostic dataset that allows an agent to perform multiple
downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised
Data Collection (CUDC) method to expand feature space using adaptive temporal
distances for task-agnostic data collection and ultimately improve learning
efficiency and capabilities for multi-task offline RL. To achieve this, CUDC
estimates the probability of the k-step future states being reachable from the
current states, and adapts how many steps into the future that the dynamics
model should predict. With this adaptive reachability mechanism in place, the
feature representation can be diversified, and the agent can navigate itself to
collect higher-quality data with curiosity. Empirically, CUDC surpasses
existing unsupervised methods in efficiency and learning performance in various
downstream offline RL tasks of the DeepMind control suite.
( 2
min )
The Distributional Random Forest (DRF) is a recently introduced Random Forest
algorithm to estimate multivariate conditional distributions. Due to its
general estimation procedure, it can be employed to estimate a wide range of
targets such as conditional average treatment effects, conditional quantiles,
and conditional correlations. However, only results about the consistency and
convergence rate of the DRF prediction are available so far. We characterize
the asymptotic distribution of DRF and develop a bootstrap approximation of it.
This allows us to derive inferential tools for quantifying standard errors and
the construction of confidence regions that have asymptotic coverage
guarantees. In simulation studies, we empirically validate the developed theory
for inference of low-dimensional targets and for testing distributional
differences between two populations.
( 2
min )
We establish novel rates for the Gaussian approximation of random deep neural
networks with Gaussian parameters (weights and biases) and Lipschitz activation
functions, in the wide limit. Our bounds apply for the joint output of a
network evaluated any finite input set, provided a certain non-degeneracy
condition of the infinite-width covariances holds. We demonstrate that the
distance between the network output and the corresponding Gaussian
approximation scales inversely with the width of the network, exhibiting faster
convergence than the naive heuristic suggested by the central limit theorem. We
also apply our bounds to obtain theoretical approximations for the exact
Bayesian posterior distribution of the network, when the likelihood is a
bounded Lipschitz function of the network output evaluated on a (finite)
training set. This includes popular cases such as the Gaussian likelihood, i.e.
exponential of minus the mean squared error.
( 2
min )
Today we are excited to announce that the Llama Guard model is now available for customers using Amazon SageMaker JumpStart. Llama Guard provides input and output safeguards in large language model (LLM) deployment. It’s one of the components under Purple Llama, Meta’s initiative featuring open trust and safety tools and evaluations to help developers build […]
( 15
min )
In this post, you learn how to prepare data sourced from Amazon Security Lake, and then train and deploy an ML model using an IP Insights algorithm in SageMaker. This model identifies anomalous network traffic or behavior which can then be composed as part of a larger end-to-end security solution.
( 13
min )
In this issue of Research Focus: Optimized exit-augmented models for scalable efficient inference; NeurIPS LLM Efficiency Challenge; LLM-empowered automated data exploration; Boosting cloud efficiency with data-driven decision-making and optimization.
The post Research Focus: Week of December 18, 2023 appeared first on Microsoft Research.
( 9
min )
Outside the glare of the klieg lights that ChatGPT commanded this year, a troupe of autonomous machines nudged the frontiers of robotics forward. Here are six that showed special prowess — swimming, diving, gripping, seeing, strolling and flying through 2023. A Media Darling at CES Ella — a smart stroller from startup Glüxkind Technologies, of Read article >
( 7
min )
Thomson Reuters, the global content and technology company, is transforming the legal industry with generative AI. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Thomson Reuters Chief Product Officer David Wong about its potential — and implications. Many of Thomson Reuters offerings for the legal industry either address an information Read article >
( 6
min )
The latest OpenUSD updates enable users to tackle larger, more complex scenes with enhanced geometry control and streamlined asset management.
( 7
min )
These compounds can kill methicillin-resistant Staphylococcus aureus (MRSA), a bacterium that causes deadly infections.
( 10
min )
This new method draws on 200-year-old geometric foundations to give artists control over the appearance of animated characters.
( 10
min )
The latest industrial inference engines, such as FasterTransformer and
TurboTransformers, have verified that half-precision floating point (FP16) and
8-bit integer (INT8) quantization can greatly improve model inference speed.
However, the existing INT8 quantization methods are too complicated, and
improper usage will lead to model performance damage greatly. In this paper, we
develop a toolkit for users to easily quantize their models for inference, in
which Self-Adaptive Mixed-Precision (SAMP) is proposed to automatically control
quantization rate by a mixed-precision architecture to balance model accuracy
and efficiency. Experimental results show that our SAMP toolkit has a higher
speedup than PyTorch and FasterTransformer while ensuring the required
accuracy. In addition, SAMP is based on a modular design, decoupling the
tokenizer, embedding, encoder and target layers, which allows users to handle
various downstream tasks and can be seamlessly integrated into PyTorch.
( 2
min )
Online social media is integral to human life, facilitating messaging,
information sharing, and confidential communication while preserving privacy.
Platforms like Twitter, Instagram, and Facebook exemplify this phenomenon.
However, users face challenges due to network anomalies, often stemming from
malicious activities such as identity theft for financial gain or harm. This
paper proposes a novel method using user similarity measures and the Generative
Adversarial Network (GAN) algorithm to identify fake user accounts in the
Twitter dataset. Despite the problem's complexity, the method achieves an AUC
rate of 80\% in classifying and detecting fake accounts. Notably, the study
builds on previous research, highlighting advancements and insights into the
evolving landscape of anomaly detection in online social networks.
( 2
min )
Much research has been devoted to the problem of learning fair
representations; however, they do not explicitly the relationship between
latent representations. In many real-world applications, there may be causal
relationships between latent representations. Furthermore, most fair
representation learning methods focus on group-level fairness and are based on
correlations, ignoring the causal relationships underlying the data. In this
work, we theoretically demonstrate that using the structured representations
enable downstream predictive models to achieve counterfactual fairness, and
then we propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to
obtain structured representations with respect to domain knowledge. The
experimental results show that the proposed method achieves better fairness and
accuracy performance than the benchmark fairness methods.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We introduce Mesogeos, a large-scale multi-purpose dataset for wildfire
modeling in the Mediterranean. Mesogeos integrates variables representing
wildfire drivers (meteorology, vegetation, human activity) and historical
records of wildfire ignitions and burned areas for 17 years (2006-2022). It is
designed as a cloud-friendly spatio-temporal dataset, namely a datacube,
harmonizing all variables in a grid of 1km x 1km x 1-day resolution. The
datacube structure offers opportunities to assess machine learning (ML) usage
in various wildfire modeling tasks. We extract two ML-ready datasets that
establish distinct tracks to demonstrate this potential: (1) short-term
wildfire danger forecasting and (2) final burned area estimation given the
point of ignition. We define appropriate metrics and baselines to evaluate the
performance of models in each track. By publishing the datacube, along with the
code to create the ML datasets and models, we encourage the community to foster
the implementation of additional tracks for mitigating the increasing threat of
wildfires in the Mediterranean.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
As AI systems become more intelligent and their behavior becomes more
challenging to assess, they may learn to game the flaws of human feedback
instead of genuinely striving to follow instructions; however, this risk can be
mitigated by controlling how LLMs generalize human feedback to situations where
it is unreliable. To better understand how reward models generalize, we craft
69 distribution shifts spanning 8 categories. We find that reward models do not
learn to evaluate `instruction-following' by default and instead favor personas
that resemble internet text. Techniques for interpreting reward models'
internal representations achieve better generalization than standard
fine-tuning, but still frequently fail to distinguish instruction-following
from conflated behaviors. We consolidate the 15 most challenging distribution
shifts into the GENeralization analogIES (GENIES) benchmark, which we hope will
enable progress toward controlling reward model generalization.
( 2
min )
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used
extensively to train artificial neural networks. However very little is known
on to what extent SGD is crucial for to the success of this technology and, in
particular, how much it is effective in optimizing high-dimensional non-convex
cost functions as compared to other optimization algorithms such as Gradient
Descent (GD). In this work we leverage dynamical mean field theory to benchmark
its performances in the high-dimensional limit. To do that, we consider the
problem of recovering a hidden high-dimensional non-linearly encrypted signal,
a prototype high-dimensional non-convex hard optimization problem. We compare
the performances of SGD to GD and we show that SGD largely outperforms GD for
sufficiently small batch sizes. In particular, a power law fit of the
relaxation time of these algorithms shows that the recovery threshold for SGD
with small batch size is smaller than the corresponding one of GD.
( 2
min )
With the rapid growth of edge intelligence, the deployment of federated
learning (FL) over wireless networks has garnered increasing attention, which
is called Federated Edge Learning (FEEL). In FEEL, both mobile devices
transmitting model parameters over noisy channels and collecting data in
diverse environments pose challenges to the generalization of trained models.
Moreover, devices can engage in decentralized FL via Device-to-Device
communication while the communication topology of connected devices also
impacts the generalization of models. Most recent theoretical studies overlook
the incorporation of all these effects into FEEL when developing generalization
analyses. In contrast, our work presents an information-theoretic
generalization analysis for topology-aware FEEL in the presence of data
heterogeneity and noisy channels. Additionally, we propose a novel
regularization method called Federated Global Mutual Information Reduction
(FedGMIR) to enhance the performance of models based on our analysis. Numerical
results validate our theoretical findings and provide evidence for the
effectiveness of the proposed method.
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
In the domain of music and sound processing, pitch extraction plays a pivotal
role. Our research presents a specialized convolutional neural network designed
for pitch extraction, particularly from the human singing voice in acapella
performances. Notably, our approach combines synthetic data with auto-labeled
acapella sung audio, creating a robust training environment. Evaluation across
datasets comprising synthetic sounds, opera recordings, and time-stretched
vowels demonstrates its efficacy. This work paves the way for enhanced pitch
extraction in both music and voice settings.
( 2
min )
Second-order methods for deep learning -- such as KFAC -- can be useful for
neural net training. However, they are often memory-inefficient and numerically
unstable for low-precision training since their preconditioning Kronecker
factors are dense, and require high-precision matrix inversion or
decomposition. Consequently, such methods are not widely used for training
large neural networks such as transformer-based models. We address these two
issues by (i) formulating an inverse-free update of KFAC and (ii) imposing
structures in each of the Kronecker factors, resulting in a method we term
structured inverse-free natural gradient descent (SINGD). On large modern
neural networks, we show that, in contrast to KFAC, SINGD is memory efficient
and numerically robust, and often outperforms AdamW even in half precision.
Hence, our work closes a gap between first-order and second-order methods in
modern low precision training for large neural nets.
( 2
min )
This paper considers learning the hidden causal network of a linear networked
dynamical system (NDS) from the time series data at some of its nodes --
partial observability. The dynamics of the NDS are driven by colored noise that
generates spurious associations across pairs of nodes, rendering the problem
much harder. To address the challenge of noise correlation and partial
observability, we assign to each pair of nodes a feature vector computed from
the time series data of observed nodes. The feature embedding is engineered to
yield structural consistency: there exists an affine hyperplane that
consistently partitions the set of features, separating the feature vectors
corresponding to connected pairs of nodes from those corresponding to
disconnected pairs. The causal inference problem is thus addressed via
clustering the designed features. We demonstrate with simple baseline
supervised methods the competitive performance of the proposed causal inference
mechanism under broad connectivity regimes and noise correlation levels,
including a real world network. Further, we devise novel technical guarantees
of structural consistency for linear NDS under the considered regime.
( 3
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
Artificial intelligence (AI) and machine learning (ML) present revolutionary
opportunities to enhance our understanding of animal behavior and conservation
strategies. Using elephants, a crucial species in Africa's protected areas, as
our focal point, we delve into the role of AI and ML in their conservation.
Given the increasing amounts of data gathered from a variety of sensors like
cameras, microphones, geophones, drones, and satellites, the challenge lies in
managing and interpreting this vast data. New AI and ML techniques offer
solutions to streamline this process, helping us extract vital information that
might otherwise be overlooked. This paper focuses on the different AI-driven
monitoring methods and their potential for improving elephant conservation.
Collaborative efforts between AI experts and ecological researchers are
essential in leveraging these innovative technologies for enhanced wildlife
conservation, setting a precedent for numerous other species.
( 2
min )
We present a new data driven topological data analysis (TDA) approach for
estimating state spaces in dynamically changing human functional brain networks
of human. Our approach penalizes the topological distance between networks and
clusters dynamically changing brain networks into topologically distinct
states. Our method takes into account the temporal dimension of the data
through the Wasserstein distance between networks. Our method is shown to
outperform the widely used k-means clustering often used in estimating the
state space in brain networks. The method is applied to more accurately
determine the state spaces of dynamically changing functional brain networks.
Subsequently, we address the question of whether the overall topology of brain
networks is a heritable feature using the twin study design. MATLAB code for
the method is available at https://github.com/laplcebeltrami/PH-STAT.
( 2
min )
We propose a new homotopy-based conditional gradient method for solving
convex optimization problems with a large number of simple conic constraints.
Instances of this template naturally appear in semidefinite programming
problems arising as convex relaxations of combinatorial optimization problems.
Our method is a double-loop algorithm in which the conic constraint is treated
via a self-concordant barrier, and the inner loop employs a conditional
gradient algorithm to approximate the analytic central path, while the outer
loop updates the accuracy imposed on the temporal solution and the homotopy
parameter. Our theoretical iteration complexity is competitive when confronted
to state-of-the-art SDP solvers, with the decisive advantage of cheap
projection-free subroutines. Preliminary numerical experiments are provided for
illustrating the practical performance of the method.
( 2
min )
In this paper, we introduce a novel predict-and-optimize method for
profit-driven churn prevention. We frame the task of targeting customers for a
retention campaign as a regret minimization problem. The main objective is to
leverage individual customer lifetime values (CLVs) to ensure that only the
most valuable customers are targeted. In contrast, many profit-driven
strategies focus on churn probabilities while considering average CLVs. This
often results in significant information loss due to data aggregation. Our
proposed model aligns with the guidelines of Predict-and-Optimize (PnO)
frameworks and can be efficiently solved using stochastic gradient descent
methods. Results from 12 churn prediction datasets underscore the effectiveness
of our approach, which achieves the best average performance compared to other
well-established strategies in terms of average profit.
( 2
min )
Deep Neural Networks are prone to learning spurious correlations embedded in
the training data, leading to potentially biased predictions. This poses risks
when deploying these models for high-stake decision-making, such as in medical
applications. Current methods for post-hoc model correction either require
input-level annotations which are only possible for spatially localized biases,
or augment the latent feature space, thereby hoping to enforce the right
reasons. We present a novel method for model correction on the concept level
that explicitly reduces model sensitivity towards biases via gradient
penalization. When modeling biases via Concept Activation Vectors, we highlight
the importance of choosing robust directions, as traditional regression-based
approaches such as Support Vector Machines tend to result in diverging
directions. We effectively mitigate biases in controlled and real-world
settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet
and EfficientNet architectures. Code is available on
https://github.com/frederikpahde/rrclarc.
( 2
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
Neoteric works have shown that modern deep learning models can exhibit a
sparse double descent phenomenon. Indeed, as the sparsity of the model
increases, the test performance first worsens since the model is overfitting
the training data; then, the overfitting reduces, leading to an improvement in
performance, and finally, the model begins to forget critical information,
resulting in underfitting. Such a behavior prevents using traditional early
stop criteria. In this work, we have three key contributions. First, we propose
a learning framework that avoids such a phenomenon and improves generalization.
Second, we introduce an entropy measure providing more insights into the
insurgence of this phenomenon and enabling the use of traditional stop
criteria. Third, we provide a comprehensive quantitative analysis of contingent
factors such as re-initialization methods, model width and depth, and dataset
noise. The contributions are supported by empirical evidence in typical setups.
Our code is available at https://github.com/VGCQ/DSD2.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
We present CrystalBox, a novel, model-agnostic, posthoc explainability
framework for Deep Reinforcement Learning (DRL) controllers in the large family
of input-driven environments which includes computer systems. We combine the
natural decomposability of reward functions in input-driven environments with
the explanatory power of decomposed returns. We propose an efficient algorithm
to generate future-based explanations across both discrete and continuous
control environments. Using applications such as adaptive bitrate streaming and
congestion control, we demonstrate CrystalBox's capability to generate
high-fidelity explanations. We further illustrate its higher utility across
three practical use cases: contrastive explanations, network observability, and
guided reward design, as opposed to prior explainability techniques that
identify salient features.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
Transfer learning (TL) from pretrained deep models is a standard practice in
modern medical image classification (MIC). However, what levels of features to
be reused are problem-dependent, and uniformly finetuning all layers of
pretrained models may be suboptimal. This insight has partly motivated the
recent differential TL strategies, such as TransFusion (TF) and layer-wise
finetuning (LWFT), which treat the layers in the pretrained models
differentially. In this paper, we add one more strategy into this family,
called TruncatedTL, which reuses and finetunes appropriate bottom layers and
directly discards the remaining layers. This yields not only superior MIC
performance but also compact models for efficient inference, compared to other
differential TL methods. Our code is available at:
https://github.com/sun-umn/TTL
( 2
min )
Modeling and synthesizing real sRGB noise is crucial for various low-level
vision tasks. The distribution of real sRGB noise is highly complex and
affected by a multitude of factors, making its accurate modeling extremely
challenging. Therefore, recent studies have proposed methods that employ
data-driven generative models, such as generative adversarial networks (GAN)
and Normalizing Flows. These studies achieve more accurate modeling of sRGB
noise compared to traditional noise modeling methods. However, there are
performance limitations due to the inherent characteristics of each generative
model. To address this issue, we propose NM-FlowGAN, a hybrid approach that
exploits the strengths of both GAN and Normalizing Flows. We simultaneously
employ a pixel-wise noise modeling network based on Normalizing Flows, and
spatial correlation modeling networks based on GAN. In our experiments, our
NM-FlowGAN outperforms other baselines on the sRGB noise synthesis task.
Moreover, the denoising neural network, trained with synthesized image pairs
from our model, also shows superior performance compared to other baselines.
Our code is available at: https://github.com/YoungJooHan/NM-FlowGAN
( 2
min )
Although promising, existing defenses against query-based attacks share a
common limitation: they offer increased robustness against attacks at the price
of a considerable accuracy drop on clean samples. In this work, we show how to
efficiently establish, at test-time, a solid tradeoff between robustness and
accuracy when mitigating query-based attacks. Given that these attacks
necessarily explore low-confidence regions, our insight is that activating
dedicated defenses, such as RND (Qin et al., NeuRIPS 2021) and Random Image
Transformations (Xie et al., ICLR 2018), only for low-confidence inputs is
sufficient to prevent them. Our approach is independent of training and
supported by theory. We verify the effectiveness of our approach for various
existing defenses by conducting extensive experiments on CIFAR-10, CIFAR-100,
and ImageNet. Our results confirm that our proposal can indeed enhance these
defenses by providing better tradeoffs between robustness and accuracy when
compared to state-of-the-art approaches while being completely training-free.
( 2
min )
We introduce a new technique called Drapes to enhance the sensitivity in
searches for new physics at the LHC. By training diffusion models on side-band
data, we show how background templates for the signal region can be generated
either directly from noise, or by partially applying the diffusion process to
existing data. In the partial diffusion case, data can be drawn from side-band
regions, with the inverse diffusion performed for new target conditional
values, or from the signal region, preserving the distribution over the
conditional property that defines the signal region. We apply this technique to
the hunt for resonances using the LHCO di-jet dataset, and achieve
state-of-the-art performance for background template generation using high
level input features. We also show how Drapes can be applied to low level
inputs with jet constituents, reducing the model dependence on the choice of
input observables. Using jet constituents we can further improve sensitivity to
the signal process, but observe a loss in performance where the signal
significance before applying any selection is below 4$\sigma$.
( 2
min )
Catastrophic forgetting remains a challenge for neural networks, especially
in lifelong learning scenarios. In this study, we introduce MEtaplasticity from
Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference
principles. MESU harnesses synaptic uncertainty to retain information over
time, with its update rule closely approximating the diagonal Newton's method
for synaptic updates. Through continual learning experiments on permuted MNIST
tasks, we demonstrate MESU's remarkable capability to maintain learning
performance across 100 tasks without the need of explicit task boundaries.
( 2
min )
Applications of large language models (LLMs) like ChatGPT have potential to
enhance clinical decision support through conversational interfaces. However,
challenges of human-algorithmic interaction and clinician trust are poorly
understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction
and management guidance, was deployed in clinical simulation scenarios
alongside the electronic health record (EHR) with emergency medicine
physicians, internal medicine physicians, and medical students to evaluate its
effect on physician acceptance and trust in AI clinical decision support
systems (AI-CDSS). GutGPT provides risk predictions from a validated machine
learning model and evidence-based answers by querying extracted clinical
guidelines. Participants were randomized to GutGPT and an interactive
dashboard, or the interactive dashboard and a search engine. Surveys and
educational assessments taken before and after measured technology acceptance
and content mastery. Preliminary results showed mixed effects on acceptance
after using GutGPT compared to the dashboard or search engine but appeared to
improve content mastery based on simulation performance. Overall, this study
demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented
optimally and paired with interactive interfaces.
( 3
min )
A novel method, the Pareto Envelope Augmented with Reinforcement Learning
(PEARL), has been developed to address the challenges posed by multi-objective
problems, particularly in the field of engineering where the evaluation of
candidate solutions can be time-consuming. PEARL distinguishes itself from
traditional policy-based multi-objective Reinforcement Learning methods by
learning a single policy, eliminating the need for multiple neural networks to
independently solve simpler sub-problems. Several versions inspired from deep
learning and evolutionary techniques have been crafted, catering to both
unconstrained and constrained problem domains. Curriculum Learning is harnessed
to effectively manage constraints in these versions. PEARL's performance is
first evaluated on classical multi-objective benchmarks. Additionally, it is
tested on two practical PWR core Loading Pattern optimization problems to
showcase its real-world applicability. The first problem involves optimizing
the Cycle length and the rod-integrated peaking factor as the primary
objectives, while the second problem incorporates the mean average enrichment
as an additional objective. Furthermore, PEARL addresses three types of
constraints related to boron concentration, peak pin burnup, and peak pin
power. The results are systematically compared against a conventional approach,
the Non-dominated Sorting Genetic Algorithm. Notably, PEARL, specifically the
PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating
additional efforts from the algorithm designer, as opposed to a single
optimization with scaled objectives. It also outperforms the classical approach
across multiple performance metrics, including the Hyper-volume.
( 3
min )
Car following (CF) models are fundamental to describing traffic dynamics.
However, the CF behavior of human drivers is highly stochastic and nonlinear.
As a result, identifying the best CF model has been challenging and
controversial despite decades of research. Introduction of automated vehicles
has further complicated this matter as their CF controllers remain proprietary,
though their behavior appears different than human drivers. This paper develops
a stochastic learning approach to integrate multiple CF models, rather than
relying on a single model. The framework is based on approximate Bayesian
computation that probabilistically concatenates a pool of CF models based on
their relative likelihood of describing observed behavior. The approach, while
data-driven, retains physical tractability and interpretability. Evaluation
results using two datasets show that the proposed approach can better reproduce
vehicle trajectories for both human driven and automated vehicles than any
single CF model considered.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim.
28, 2018) to construct Lyapunov functions for optimization algorithms in
discrete and continuous time. For smooth, strongly convex objective functions,
we relax the requirements necessary for such a construction. As a result we are
able to prove for Polyak's ordinary differential equations and for a
two-parameter family of Nesterov algorithms rates of convergence that improve
on those available in the literature. We analyse the interpretation of Nesterov
algorithms as discretizations of the Polyak equation. We show that the
algorithms are instances of Additive Runge-Kutta integrators and discuss the
reasons why most discretizations of the differential equation do not result in
optimization algorithms with acceleration. We also introduce a modification of
Polyak's equation and study its convergence properties. Finally we extend the
general framework to the stochastic scenario and consider an application to
random algorithms with acceleration for overparameterized models; again we are
able to prove convergence rates that improve on those in the literature.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
In modern federated learning, one of the main challenges is to account for
inherent heterogeneity and the diverse nature of data distributions for
different clients. This problem is often addressed by introducing
personalization of the models towards the data distribution of the particular
client. However, a personalized model might be unreliable when applied to the
data that is not typical for this client. Eventually, it may perform worse for
these data than the non-personalized global model trained in a federated way on
the data from all the clients. This paper presents a new approach to federated
learning that allows selecting a model from global and personalized ones that
would perform better for a particular input point. It is achieved through a
careful modeling of predictive uncertainties that helps to detect local and
global in- and out-of-distribution data and use this information to select the
model that is confident in a prediction. The comprehensive experimental
evaluation on the popular real-world image datasets shows the superior
performance of the model in the presence of out-of-distribution data while
performing on par with state-of-the-art personalized federated learning
algorithms in the standard scenarios.
( 2
min )
In this paper, we explore the capability of both the Adjacency Spectral
Embedding (ASE) and the Graph Encoder Embedding (GEE) for capturing an embedded
pseudo-clique structure in the random dot product graph setting. In both theory
and experiments, we demonstrate that this pairing of model and methods can
yield worse results than the best existing spectral clique detection methods,
demonstrating at once the methods' potential inability to capture even modestly
sized pseudo-cliques and the methods' robustness to the model contamination
giving rise to the pseudo-clique structure. To further enrich our analysis, we
also consider the Variational Graph Auto-Encoder (VGAE) model in our simulation
and real data experiments.
( 2
min )
Block majorization-minimization (BMM) is a simple iterative algorithm for
nonconvex optimization that sequentially minimizes a majorizing surrogate of
the objective function in each block coordinate while the other block
coordinates are held fixed. We consider a family of BMM algorithms for
minimizing smooth nonconvex objectives, where each parameter block is
constrained within a subset of a Riemannian manifold. We establish that this
algorithm converges asymptotically to the set of stationary points, and attains
an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$
iterations. In particular, the assumptions for our complexity results are
completely Euclidean when the underlying manifold is a product of Euclidean or
Stiefel manifolds, although our analysis makes explicit use of the Riemannian
geometry. Our general analysis applies to a wide range of algorithms with
Riemannian constraints: Riemannian MM, block projected gradient descent,
optimistic likelihood estimation, geodesically constrained subspace tracking,
robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate
that our algorithm converges faster than standard Euclidean algorithms applied
to the Riemannian setting.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data \footnote{Source code available at
\url{https://github.com/wz16/DVA}.}. By conducting a broad range of
experiments, we demonstrate that our proposed approach provides a much closer
approximation to the actual data uncertainty than the standard method.
( 2
min )
Current deep learning algorithms designed for automatic ECG analysis have
exhibited notable accuracy. However, akin to traditional electrocardiography,
they tend to be narrowly focused and typically address a singular diagnostic
condition. In this study, we specifically demonstrate the capability of a
single model to predict a diverse range of both cardiac and non-cardiac
discharge diagnoses based on a sole ECG collected in the emergency department.
Among the 1,076 hierarchically structured ICD codes considered, our model
achieves an AUROC exceeding 0.8 in 439 of them. This underscores the models
proficiency in handling a wide array of diagnostic scenarios. We emphasize the
potential of utilizing this model as a screening tool, potentially integrated
into a holistic clinical decision support system for efficiently triaging
patients in the emergency department. This research underscores the remarkable
capabilities of comprehensive ECG analysis algorithms and the extensive range
of possibilities facilitated by the open MIMIC-IV-ECG dataset. Finally, our
data may play a pivotal role in revolutionizing the way ECG analysis is
performed, marking a significant advancement in the field.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
Random Forest is a machine learning method that offers many advantages,
including the ability to easily measure variable importance. Class balancing
technique is a well-known solution to deal with class imbalance problem.
However, it has not been actively studied on RF variable importance. In this
paper, we study the effect of class balancing on RF variable importance. Our
simulation results show that over-sampling is effective in correctly measuring
variable importance in class imbalanced situations with small sample size,
while under-sampling fails to differentiate important and non-informative
variables. We then propose a variable selection algorithm that utilizes RF
variable importance and its confidence interval. Through an experimental study
using many real and artificial datasets, we demonstrate that our proposed
algorithm efficiently selects an optimal feature set, leading to improved
prediction performance in class imbalance problem.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
We consider the problem of inferring latent stochastic differential equations
(SDEs) with a time and memory cost that scales independently with the amount of
data, the total length of the time series, and the stiffness of the approximate
differential equations. This is in stark contrast to typical methods for
inferring latent differential equations which, despite their constant memory
cost, have a time complexity that is heavily dependent on the stiffness of the
approximate differential equation. We achieve this computational advancement by
removing the need to solve differential equations when approximating gradients
using a novel amortization strategy coupled with a recently derived
reparametrization of expectations under linear SDEs. We show that, in practice,
this allows us to achieve similar performance to methods based on adjoint
sensitivities with more than an order of magnitude fewer evaluations of the
model in training.
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its theoretical properties both in offline and online settings and propose
efficient algorithms with finite-sample theoretical guarantees. Our work
bridges the gap between theory and practice by linking our theoretical insights
with existing practical alignment algorithms such as Direct Preference
Optimization (DPO) and Rejection Sampling Optimization (RSO). Furthermore,
these findings and connections also offer both theoretical and practical
communities new tools and insights for future algorithmic design of alignment
algorithms.
( 2
min )
We derive a concentration bound of the type `for all $n \geq n_0$ for some
$n_0$' for TD(0) with linear function approximation. We work with online TD
learning with samples from a single sample path of the underlying Markov chain.
This makes our analysis significantly different from offline TD learning or TD
learning with access to independent samples from the stationary distribution of
the Markov chain. We treat TD(0) as a contractive stochastic approximation
algorithm, with both martingale and Markov noises. Markov noise is handled
using the Poisson equation and the lack of almost sure guarantees on
boundedness of iterates is handled using the concept of relaxed concentration
inequalities.
( 2
min )
In the lead-up to next month’s CES trade show in Las Vegas, NVIDIA will unveil its latest advancements in artificial intelligence — including generative AI — and a spectrum of other cutting-edge technologies. Scheduled for Monday, Jan. 8, at 8 a.m. PT, the company’s special address will be publicly streamed. Save the date and plan Read article >
( 5
min )
NVIDIA DLSS 3.5 for realistic ray-traced visuals is now available on D5 Render, a real-time 3D creation software.
( 7
min )
This post was written in collaboration with Ankur Goyal and Karthikeyan Chokappa from PwC Australia’s Cloud & Digital business. Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production […]
( 10
min )
Dementia diagnosis requires a series of different testing methods, which is
complex and time-consuming. Early detection of dementia is crucial as it can
prevent further deterioration of the condition. This paper utilizes a speech
recognition model to construct a dementia assessment system tailored for
Mandarin speakers during the picture description task. By training an
attention-based speech recognition model on voice data closely resembling
real-world scenarios, we have significantly enhanced the model's recognition
capabilities. Subsequently, we extracted the encoder from the speech
recognition model and added a linear layer for dementia assessment. We
collected Mandarin speech data from 99 subjects and acquired their clinical
assessments from a local hospital. We achieved an accuracy of 92.04% in
Alzheimer's disease detection and a mean absolute error of 9% in clinical
dementia rating score prediction.
( 2
min )
One of the challenges in deploying a machine learning model is that the
model's performance degrades as the operating environment changes. To maintain
the performance, streaming active learning is used, in which the model is
retrained by adding a newly annotated sample to the training dataset if the
prediction of the sample is not certain enough. Although many streaming active
learning methods have been proposed for classification, few efforts have been
made for regression problems, which are often handled in the industrial field.
In this paper, we propose to use the regression-via-classification framework
for streaming active learning for regression. Regression-via-classification
transforms regression problems into classification problems so that streaming
active learning methods proposed for classification problems can be applied
directly to regression problems. Experimental validation on four real data sets
shows that the proposed method can perform regression with higher accuracy at
the same annotation cost.
( 2
min )
A common approach to learning mobile health (mHealth) intervention policies
is linear Thompson sampling. Two desirable mHealth policy features are (1)
pooling information across individuals and time and (2) incorporating a
time-varying baseline reward. Previous approaches pooled information across
individuals but not time, failing to capture trends in treatment effects over
time. In addition, these approaches did not explicitly model the baseline
reward, which limited the ability to precisely estimate the parameters in the
differential reward model. In this paper, we propose a novel Thompson sampling
algorithm, termed ''DML-TS-NNR'' that leverages (1) nearest-neighbors to
efficiently pool information on the differential reward function across users
and time and (2) the Double Machine Learning (DML) framework to explicitly
model baseline rewards and stay agnostic to the supervised learning algorithms
used. By explicitly modeling baseline rewards, we obtain smaller confidence
sets for the differential reward parameters. We offer theoretical guarantees on
the pseudo-regret, which are supported by empirical results. Importantly, the
DML-TS-NNR algorithm demonstrates robustness to potential misspecifications in
the baseline reward model.
( 2
min )
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIE$\_$AT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37$\%$ higher accuracy on the split dataset than on the original
dataset and a 7.55$\%$ improvement in accuracy over the basic comparison model.
( 2
min )
While federated learning is promising for privacy-preserving collaborative
learning without revealing local data, it remains vulnerable to white-box
attacks and struggles to adapt to heterogeneous clients. Federated distillation
(FD), built upon knowledge distillation--an effective technique for
transferring knowledge from a teacher model to student models--emerges as an
alternative paradigm, which provides enhanced privacy guarantees and addresses
model heterogeneity. Nevertheless, challenges arise due to variations in local
data distributions and the absence of a well-trained teacher model, which leads
to misleading and ambiguous knowledge sharing that significantly degrades model
performance. To address these issues, this paper proposes a selective knowledge
sharing mechanism for FD, termed Selective-FD. It includes client-side
selectors and a server-side selector to accurately and precisely identify
knowledge from local and ensemble predictions, respectively. Empirical studies,
backed by theoretical insights, demonstrate that our approach enhances the
generalization capabilities of the FD framework and consistently outperforms
baseline methods.
( 2
min )
The influx of massive amounts of data from current and upcoming cosmological
surveys necessitates compression schemes that can efficiently summarize the
data with minimal loss of information. We introduce a method that leverages the
paradigm of self-supervised machine learning in a novel manner to construct
representative summaries of massive datasets using simulation-based
augmentations. Deploying the method on hydrodynamical cosmological simulations,
we show that it can deliver highly informative summaries, which can be used for
a variety of downstream tasks, including precise and accurate parameter
inference. We demonstrate how this paradigm can be used to construct summary
representations that are insensitive to prescribed systematic effects, such as
the influence of baryonic physics. Our results indicate that self-supervised
machine learning techniques offer a promising new approach for compression of
cosmological data as well its analysis.
( 2
min )
Many functions characterising physical systems are additively separable. This
is the case, for instance, of mechanical Hamiltonian functions in physics,
population growth equations in biology, and consumer preference and utility
functions in economics. We consider the scenario in which a surrogate of a
function is to be tested for additive separability. The detection that the
surrogate is additively separable can be leveraged to improve further learning.
Hence, it is beneficial to have the ability to test for such separability in
surrogates. The mathematical approach is to test if the mixed partial
derivative of the surrogate is zero; or empirically, lower than a threshold. We
present and comparatively and empirically evaluate the eight methods to compute
the mixed partial derivative of a surrogate function.
( 2
min )
While coresets have been growing in terms of their application, barring few
exceptions, they have mostly been limited to unsupervised settings. We consider
supervised classification problems, and non-decomposable evaluation measures in
such settings. We show that stratified uniform sampling based coresets have
excellent empirical performance that are backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used
non-decomposable objective functions that are nontrivial to optimize for and
show that uniform coresets attain a lower bound for coreset size, and have good
empirical performance, comparable with ``smarter'' coreset construction
strategies.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
High-resolution image generation with Generative Artificial Intelligence
(GenAI) has immense potential but, due to the enormous capital investment
required for training, it is increasingly centralised to a few large
corporations, and hidden behind paywalls. This paper aims to democratise
high-resolution GenAI by advancing the frontier of high-resolution generation
while remaining accessible to a broad audience. We demonstrate that existing
Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution
image generation. Our novel DemoFusion framework seamlessly extends open-source
GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated
Sampling mechanisms to achieve higher-resolution image generation. The
progressive nature of DemoFusion requires more passes, but the intermediate
results can serve as "previews", facilitating rapid prompt iteration.
( 2
min )
We study monotone submodular maximization under general matroid constraints
in the online setting. We prove that online optimization of a large class of
submodular functions, namely, weighted threshold potential functions, reduces
to online convex optimization (OCO). This is precisely because functions in
this class admit a concave relaxation; as a result, OCO policies, coupled with
an appropriate rounding scheme, can be used to achieve sublinear regret in the
combinatorial setting. We show that our reduction extends to many different
versions of the online learning problem, including the dynamic regret, bandit,
and optimistic-learning settings.
( 2
min )
The aim of this paper is to provide a theoretically founded investigation of
state-of-the-art learning approaches for inverse problems. We give an extended
definition of regularization methods and their convergence in terms of the
underlying data distributions, which paves the way for future theoretical
studies. Based on a simple spectral learning model previously introduced for
supervised learning, we investigate some key properties of different learning
paradigms for inverse problems, which can be formulated independently of
specific architectures. In particular we investigate the regularization
properties, bias, and critical dependence on training data distributions.
Moreover, our framework allows to highlight and compare the specific behavior
of the different paradigms in the infinite-dimensional limit.
( 2
min )
In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows
method to final states containing multiple neutrinos. The architecture can
natively scale for all combinations of object types and multiplicities in the
final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton
events, the momenta of both neutrinos and correlations between them are
reconstructed more accurately than when using the most popular standard
analytical techniques, and solutions are found for all events. Inference time
is significantly faster than competing methods, and can be reduced further by
evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to
$t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded
distributions is much closer to the limit of performance set by perfect
neutrino reconstruction than standard techniques. For the chosen double
differential observables $\nu^2$-Flows results in improved statistical
precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino
Weighting method and up to a factor of four in comparison to the Ellipse
approach.
( 3
min )
The community explored to build private inference frameworks for
transformer-based large language models (LLMs) in a server-client setting,
where the server holds the model parameters and the client inputs its private
data (or prompt) for inference. However, these frameworks impose significant
overhead when the private inputs are forward propagated through the original
LLMs. In this paper, we show that substituting the computation- and
communication-heavy operators in the transformer architecture with
privacy-computing friendly approximations can greatly reduce the private
inference costs while incurring very minor impact on model performance.
Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing
friendly model inference pipeline achieves a $5\times$ acceleration in
computation and an 80% reduction in communication overhead, while retaining
nearly identical accuracy.
( 2
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
Biomedical entity linking (BioEL) has achieved remarkable progress with the
help of pre-trained language models. However, existing BioEL methods usually
struggle to handle rare and difficult entities due to long-tailed distribution.
To address this limitation, we introduce a new scheme $k$NN-BioEL, which
provides a BioEL model with the ability to reference similar instances from the
entire training corpus as clues for prediction, thus improving the
generalization capabilities. Moreover, we design a contrastive learning
objective with dynamic hard negative sampling (DHNS) that improves the quality
of the retrieved neighbors during inference. Extensive experimental results
show that $k$NN-BioEL outperforms state-of-the-art baselines on several
datasets.
( 2
min )
We present a deep Graph Convolutional Kernel Machine (GCKM) for
semi-supervised node classification in graphs. The method is built of two main
types of blocks: (i) We introduce unsupervised kernel machine layers
propagating the node features in a one-hop neighborhood, using implicit node
feature mappings. (ii) We specify a semi-supervised classification kernel
machine through the lens of the Fenchel-Young inequality. We derive an
effective initialization scheme and efficient end-to-end training algorithm in
the dual variables for the full architecture. The main idea underlying GCKM is
that, because of the unsupervised core, the final model can achieve higher
performance in semi-supervised node classification when few labels are
available for training. Experimental results demonstrate the effectiveness of
the proposed framework.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
The promise of Mobile Health (mHealth) is the ability to use wearable sensors
to monitor participant physiology at high frequencies during daily life to
enable temporally-precise health interventions. However, a major challenge is
frequent missing data. Despite a rich imputation literature, existing
techniques are ineffective for the pulsative signals which comprise many
mHealth applications, and a lack of available datasets has stymied progress. We
address this gap with PulseImpute, the first large-scale pulsative signal
imputation challenge which includes realistic mHealth missingness models, an
extensive set of baselines, and clinically-relevant downstream tasks. Our
baseline models include a novel transformer-based architecture designed to
exploit the structure of pulsative signals. We hope that PulseImpute will
enable the ML community to tackle this significant and challenging task.
( 2
min )
Can a machine or algorithm discover or learn Kepler's first law from
astronomical sightings alone? We emulate Johannes Kepler's discovery of the
equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a
physics-inspired tool for symbolic regression.
( 2
min )
Exact Bayesian inference on state-space models~(SSM) is in general
untractable, and unfortunately, basic Sequential Monte Carlo~(SMC) methods do
not yield correct approximations for complex models. In this paper, we propose
a mixed inference algorithm that computes closed-form solutions using belief
propagation as much as possible, and falls back to sampling-based SMC methods
when exact computations fail. This algorithm thus implements automatic
Rao-Blackwellization and is even exact for Gaussian tree models.
( 2
min )
Policy learning in robot-assisted surgery (RAS) lacks data efficient and
versatile methods that exhibit the desired motion quality for delicate surgical
interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a
novel method for imitation learning (IL) in RAS that focuses on gentle
manipulation of deformable objects. The approach combines the versatility of
diffusion-based imitation learning (DIL) with the high-quality motion
generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs).
This combination enables MPD to achieve gentle manipulation of deformable
objects, while maintaining data efficiency critical for RAS applications where
demonstration data is scarce. We evaluate MPD across various simulated tasks
and a real world robotic setup on both state and image observations. MPD
outperforms state-of-the-art DIL methods in success rate, motion quality, and
data efficiency.
( 2
min )
Venn Prediction (VP) is a new machine learning framework for producing
well-calibrated probabilistic predictions. In particular it provides
well-calibrated lower and upper bounds for the conditional probability of an
example belonging to each possible class of the problem at hand. This paper
proposes five VP methods based on Neural Networks (NNs), which is one of the
most widely used machine learning techniques. The proposed methods are
evaluated experimentally on four benchmark datasets and the obtained results
demonstrate the empirical well-calibratedness of their outputs and their
superiority over the outputs of the traditional NN classifier.
( 2
min )
Artificial Intelligence (AI) based image analysis has an immense potential to
support diagnostic histopathology, including cancer diagnostics. However,
developing supervised AI methods requires large-scale annotated datasets. A
potentially powerful solution is to augment training data with synthetic data.
Latent diffusion models, which can generate high-quality, diverse synthetic
images, are promising. However, the most common implementations rely on
detailed textual descriptions, which are not generally available in this
domain. This work proposes a method that constructs structured textual prompts
from automatically extracted image features. We experiment with the PCam
dataset, composed of tissue patches only loosely annotated as healthy or
cancerous. We show that including image-derived features in the prompt, as
opposed to only healthy and cancerous labels, improves the Fr\'echet Inception
Distance (FID) from 178.8 to 90.2. We also show that pathologists find it
challenging to detect synthetic images, with a median sensitivity/specificity
of 0.55/0.55. Finally, we show that synthetic data effectively trains AI
models.
( 3
min )
Offline reinforcement learning leverages pre-collected datasets of
transitions to train policies. It can serve as effective initialization for
online algorithms, enhancing sample efficiency and speeding up convergence.
However, when such datasets are limited in size and quality, offline
pre-training can produce sub-optimal policies and lead to degraded online
reinforcement learning performance. In this paper we propose a model-based data
augmentation strategy to maximize the benefits of offline reinforcement
learning pre-training and reduce the scale of data needed to be effective. Our
approach leverages a world model of the environment trained on the offline
dataset to augment states during offline pre-training. We evaluate our approach
on a variety of MuJoCo robotic tasks and our results show it can jump-start
online fine-tuning and substantially reduce - in some cases by an order of
magnitude - the required number of environment interactions.
( 2
min )
This paper studies the problem of CPRP, concept prerequisite relation
prediction, which is a fundamental task in using AI for education. CPRP is
usually formulated into a link-prediction task on a relationship graph of
concepts and solved by training the graph neural network (GNN) model. However,
current directed GNNs fail to manage graph isomorphism which refers to the
invariance of non-isomorphic graphs, reducing the expressivity of resulting
representations. We present a permutation-equivariant directed GNN model by
introducing the Weisfeiler-Lehman test into directed GNN learning. Our method
is then used for CPRP and evaluated on three public datasets. The experimental
results show that our model delivers better prediction performance than the
state-of-the-art methods.
( 2
min )
In this paper we propose a new method for training neural networks (NNs) for
frequency modulated continuous wave (FMCW) radar mutual interference
mitigation. Instead of training NNs to regress from interfered to clean radar
signals as in previous work, we train NNs directly on object detection maps. We
do so by performing a continuous relaxation of the cell-averaging constant
false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm
for object detection using radar. With this new training objective we are able
to increase object detection performance by a large margin. Furthermore, we
introduce separable convolution kernels to strongly reduce the number of
parameters and computational complexity of convolutional NN architectures for
radar applications. We validate our contributions with experiments on
real-world measurement data and compare them against signal processing
interference mitigation methods.
( 2
min )
This paper presents a method for learning Hamiltonian dynamics from a limited
set of data points. The Hamiltonian vector field is found by regularized
optimization over a reproducing kernel Hilbert space of vector fields that are
inherently Hamiltonian, and where the vector field is required to be odd or
even. This is done with a symplectic kernel, and it is shown how this
symplectic kernel can be modified to be odd or even. The performance of the
method is validated in simulations for two Hamiltonian systems. It is shown
that the learned dynamics are Hamiltonian, and that the learned Hamiltonian
vector field can be prescribed to be odd or even.
( 2
min )
Congenital heart disease (CHD) is a relatively rare disease that affects
patients at birth and results in extremely heterogeneous anatomical and
functional defects. 12-lead ECG signal is routinely collected in CHD patients
because it provides significant biomarkers for disease prognosis. However,
developing accurate machine learning models is challenging due to the lack of
large available datasets. Here, we suggest exploiting the Riemannian geometry
of the spatial covariance structure of the ECG signal to improve
classification. Firstly, we use covariance augmentation to mix samples across
the Riemannian geodesic between corresponding classes. Secondly, we suggest to
project the covariance matrices to their respective class Riemannian mean to
enhance the quality of feature extraction via tangent space projection. We
perform several ablation experiments and demonstrate significant improvement
compared to traditional machine learning models and deep learning on ECG time
series data.
( 2
min )
Despite being a unique source of information on patients' status and disease
progression, clinical notes are characterized by high levels of duplication and
information redundancy. In general domain text, it has been shown that
deduplication does not harm language model (LM) pretraining, thus helping
reduce the training cost. Although large LMs have proven to learn medical
knowledge, they still require specialized domain adaptation for improved
downstream clinical tasks. By leveraging large real-world clinical corpora, we
first provided a fine-grained characterization of duplicates stemming from
common writing practices and clinical relevancy. Second, we demonstrated that
deduplicating clinical text can help clinical LMs encode less redundant
information in a more efficient manner and do not harm classification tasks via
prompt-based learning.
( 2
min )
Binary code summarization, while invaluable for understanding code semantics,
is challenging due to its labor-intensive nature. This study delves into the
potential of large language models (LLMs) for binary code comprehension. To
this end, we present BinSum, a comprehensive benchmark and dataset of over 557K
binary functions and introduce a novel method for prompt synthesis and
optimization. To more accurately gauge LLM performance, we also propose a new
semantic similarity metric that surpasses traditional exact-match approaches.
Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2,
and Code Llama, reveals 10 pivotal insights. This evaluation generates 4
billion inference tokens, incurred a total expense of 11,418 US dollars and 873
NVIDIA A100 GPU hours. Our findings highlight both the transformative potential
of LLMs in this field and the challenges yet to be overcome.
( 2
min )
Despite the remarkable advances in deep learning technology, achieving
satisfactory performance in lung sound classification remains a challenge due
to the scarcity of available data. Moreover, the respiratory sound samples are
collected from a variety of electronic stethoscopes, which could potentially
introduce biases into the trained models. When a significant distribution shift
occurs within the test dataset or in a practical scenario, it can substantially
decrease the performance. To tackle this issue, we introduce cross-domain
adaptation techniques, which transfer the knowledge from a source domain to a
distinct target domain. In particular, by considering different stethoscope
types as individual domains, we propose a novel stethoscope-guided supervised
contrastive learning approach. This method can mitigate any domain-related
disparities and thus enables the model to distinguish respiratory sounds of the
recording variation of the stethoscope. The experimental results on the ICBHI
dataset demonstrate that the proposed methods are effective in reducing the
domain dependency and achieving the ICBHI Score of 61.71%, which is a
significant improvement of 2.16% over the baseline.
( 2
min )
Our study focuses on the potential for modifications of Inception-like
architecture within the electrocardiogram (ECG) domain. To this end, we
introduce IncepSE, a novel network characterized by strategic architectural
incorporation that leverages the strengths of both InceptionTime and channel
attention mechanisms. Furthermore, we propose a training setup that employs
stabilization techniques that are aimed at tackling the formidable challenges
of severe imbalance dataset PTB-XL and gradient corruption. By this means, we
manage to set a new height for deep learning model in a supervised learning
manner across the majority of tasks. Our model consistently surpasses
InceptionTime by substantial margins compared to other state-of-the-arts in
this domain, noticeably 0.013 AUROC score improvement in the "all" task, while
also mitigating the inherent dataset fluctuations during training.
( 2
min )
$B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and
robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and
$B_0$ calibration scans can mitigate this but add scan time and cannot be
applied retrospectively to previously collected data. Here, we proposed a
calibration-free sequence-adaptive deep-learning framework, to estimate and
correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its
capability on arbitrary MRF sequences at 3T, where no training data were
previously obtained. Such approach can be applied to any previously-acquired
and future MRF-scans. The flexibility in directly applying this framework to
other quantitative sequences is also highlighted.
( 2
min )
Uncertainty Quantification (UQ) has gained traction in an attempt to fix the
black-box nature of Deep Learning. Specifically (medical) biosignals such as
electroencephalography (EEG), electrocardiography (ECG), electroocculography
(EOG) and electromyography (EMG) could benefit from good UQ, since these suffer
from a poor signal to noise ratio, and good human interpretability is pivotal
for medical applications and Brain Computer Interfaces. In this paper, we
review the state of the art at the intersection of Uncertainty Quantification
and Biosignal with Machine Learning. We present various methods, shortcomings,
uncertainty measures and theoretical frameworks that currently exist in this
application domain. Overall it can be concluded that promising UQ methods are
available, but that research is needed on how people and systems may interact
with an uncertainty model in a (clinical) environment.
( 2
min )
In this study, we propose an approach for predicting rare events by
exploiting time series in coevolution. Our approach involves a weighted
autologistic regression model, where we leverage the temporal behavior of the
data to enhance predictive capabilities. By addressing the issue of imbalanced
datasets, we establish constraints leading to weight estimation and to improved
performance. Evaluation on synthetic and real-world datasets confirms that our
approach outperform state-of-the-art of predicting home equipment failure
methods.
( 2
min )
This study introduces an innovative 3D printed dry electrode tailored for
biosensing in postoperative recovery scenarios. Fabricated through a drop
coating process, the electrode incorporates a novel 2D material.
( 2
min )
Biased enhanced sampling methods utilizing collective variables (CVs) are
powerful tools for sampling conformational ensembles. Due to high intrinsic
dimensions, efficiently generating conformational ensembles for complex systems
requires enhanced sampling on high-dimensional free energy surfaces. While
methods like temperature-accelerated molecular dynamics (TAMD) can adopt many
CVs in a simulation, unbiasing the simulation requires accurate modeling of a
high-dimensional CV probability distribution, which is challenging for
traditional density estimation techniques. Here we propose an unbiasing method
based on the score-based diffusion model, a deep generative learning method
that excels in density estimation across complex data landscapes. We test the
score-based diffusion unbiasing method on TAMD simulations. The results
demonstrate that this unbiasing approach significantly outperforms traditional
unbiasing methods, and can generate accurate unbiased conformational ensembles
for simulations with a number of CVs higher than usual ranges.
( 2
min )
Catastrophic forgetting(CF) is a significant challenge in continual learning
(CL). In regularization-based approaches to mitigate CF, modifications to
important training parameters are penalized in subsequent tasks using an
appropriate loss function. We propose the RTRA, a modification to the widely
used Elastic Weight Consolidation (EWC) regularization scheme, using the
Natural Gradient for loss function optimization. Our approach improves the
training of regularization-based methods without sacrificing test-data
performance. We compare the proposed RTRA approach against EWC using the
iFood251 dataset. We show that RTRA has a clear edge over the state-of-the-art
approaches.
( 2
min )
Rehearsal-based techniques are commonly used to mitigate catastrophic
forgetting (CF) in Incremental learning (IL). The quality of the exemplars
selected is important for this purpose and most methods do not ensure the
appropriate diversity of the selected exemplars. We propose a new technique
"DSS" -- Diverse Selection of Samples from the input data stream in the
Class-incremental learning (CIL) setup under both disjoint and fuzzy task
boundary scenarios. Our method outperforms state-of-the-art methods and is much
simpler to understand and implement.
( 2
min )
We propose a novel exemplar selection approach based on Principal Component
Analysis (PCA) and median sampling, and a neural network training regime in the
setting of class-incremental learning. This approach avoids the pitfalls due to
outliers in the data and is both simple to implement and use across various
incremental machine learning models. It also has independent usage as a
sampling algorithm. We achieve better performance compared to state-of-the-art
methods.
( 2
min )
The goal of this series is to chronicle opinions and issues in the field of
machine learning as they stand today and as they change over time. The plan is
to host this survey periodically until the AI singularity
paperclip-frenzy-driven doomsday, keeping an updated list of topical questions
and interviewing new community members for each edition. In this issue, we
probed people's opinions on interpretable AI, the value of benchmarking in
modern NLP, the state of progress towards understanding deep learning, and the
future of academia.
( 2
min )
In this survey, we examine algorithms for conducting credit assignment in
artificial neural networks that are inspired or motivated by neurobiology,
unifying these various processes under one possible taxonomy. Our proposed
taxonomy is constructed based on how a learning algorithm answers a central
question underpinning the mechanisms of synaptic plasticity in complex adaptive
neuronal systems: where do the signals that drive the learning in individual
elements of a network come from and how are they produced? In this unified
treatment, we organize the ever-growing set of brain-inspired learning
processes into six general families and consider these in the context of
backpropagation of errors and its known criticisms. The results of this review
are meant to encourage future developments in neuro-mimetic systems and their
constituent learning processes, wherein lies the opportunity to build a strong
bridge between machine learning, computational neuroscience, and cognitive
science.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
There have been claims that artificial intelligence is bringing about increased productivity, accuracy, and a smarter workplace. In all of this excitement, it is difficult to differentiate between fact and fantasy. When it comes to the management of workforces, what is the truth there? Within the context of real-world applications, how much hype is there?… Read More »How can data science and AI help HR in workforce development, evaluation, and retention?
The post How can data science and AI help HR in workforce development, evaluation, and retention? appeared first on Data Science Central.
( 29
min )
Artificial intelligence (AI) is one of the most transformational technologies of our generation and provides opportunities to be a force for good and drive economic growth. The growth of large language models (LLMs), with hundreds of billions of parameters, has unlocked new generative AI use cases to improve customer experiences, boost employee productivity, and so […]
( 4
min )
This is a guest post co-written with Babu Srinivasan from MongoDB. As industries evolve in today’s fast-paced business landscape, the inability to have real-time forecasts poses significant challenges for industries heavily reliant on accurate and timely insights. The absence of real-time forecasts in various industries presents pressing business challenges that can significantly impact decision-making and […]
( 8
min )
In this episode of “AI Frontiers,” AI4Science Director Chris Bishop talks about the state of deep learning; his new textbook, “Deep Learning: Foundations and Concepts,” and the impact the field is having on the natural sciences.
The post AI Frontiers: A deep dive into deep learning with Ashley Llorens and Chris Bishop appeared first on Microsoft Research.
( 24
min )
Bilevel optimization has received more and more attention recently due to its
wide applications in machine learning. In this paper, we consider bilevel
optimization in decentralized networks. In particular, we propose a novel
single-loop algorithm for solving decentralized bilevel optimization with
strongly convex lower level problem. Our algorithm is fully single-loop and
does not require heavy matrix-vector multiplications when approximating the
hypergradient. Moreover, unlike existing methods for decentralized bilevel
optimization and federated bilevel optimization, our algorithm does not require
any gradient heterogeneity assumption. Our analysis shows that the proposed
algorithm achieves a sublinear convergence rate. Experimental results on
hyperparameter optimization problem with both synthetic and MNIST data sets
demonstrate the efficiency of the proposed algorithm.
( 2
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” I outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure that outcome. In part… Read More »AI and Justice in a Brave New World: Part 3 – AI Governance
The post AI and Justice in a Brave New World: Part 3 – AI Governance appeared first on Data Science Central.
( 23
min )
In recent years, Transformer-based auto-attention mechanisms have been
successfully applied to the analysis of a variety of context-reliant data
types, from texts to images and beyond, including data from non-Euclidean
geometries. In this paper, we present such a mechanism, designed to classify
sequences of Symmetric Positive Definite matrices while preserving their
Riemannian geometry throughout the analysis. We apply our method to automatic
sleep staging on timeseries of EEG-derived covariance matrices from a standard
dataset, obtaining high levels of stage-wise performance.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
Real-time monitoring of human behaviours, especially in e-Health
applications, has been an active area of research in the past decades. On top
of IoT-based sensing environments, anomaly detection algorithms have been
proposed for the early detection of abnormalities. Gradual change procedures,
commonly referred to as drift anomalies, have received much less attention in
the literature because they represent a much more challenging scenario than
sudden temporary changes (point anomalies). In this paper, we propose, for the
first time, a fully unsupervised real-time drift detection algorithm named
DynAmo, which can identify drift periods as they are happening. DynAmo
comprises a dynamic clustering component to capture the overall trends of
monitored behaviours and a trajectory generation component, which extracts
features from the densest cluster centroids. Finally, we apply an ensemble of
divergence tests on sliding reference and detection windows to detect drift
periods in the behavioural sequence.
( 2
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
(1) The enhanced capability of Graph Neural Networks (GNNs) in unsupervised
community detection of clustered nodes is attributed to their capacity to
encode both the connectivity and feature information spaces of graphs. The
identification of latent communities holds practical significance in various
domains, from social networks to genomics. Current real-world performance
benchmarks are perplexing due to the multitude of decisions influencing GNN
evaluations for this task. (2) Three metrics are compared to assess the
consistency of algorithm rankings in the presence of randomness. The
consistency and quality of performance between the results under a
hyperparameter optimisation with the default hyperparameters is evaluated. (3)
The results compare hyperparameter optimisation with default hyperparameters,
revealing a significant performance loss when neglecting hyperparameter
investigation. A comparison of metrics indicates that ties in ranks can
substantially alter the quantification of randomness. (4) Ensuring adherence to
the same evaluation criteria may result in notable differences in the reported
performance of methods for this task. The $W$ Randomness coefficient, based on
the Wasserstein distance, is identified as providing the most robust assessment
of randomness.
( 3
min )
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems,
where a central operator assigns vehicles to customer requests or rejects these
with the aim of maximizing its total profit. Recent approaches use multi-agent
deep reinforcement learning (MADRL) to realize scalable yet performant
algorithms, but train agents based on local rewards, which distorts the reward
signal with respect to the system-wide profit, leading to lower performance. We
therefore propose a novel global-rewards-based MADRL algorithm for vehicle
dispatching in AMoD systems, which resolves so far existing goal conflicts
between the trained agents and the operator by assigning rewards to agents
leveraging a counterfactual baseline. Our algorithm shows statistically
significant improvements across various settings on real-world data compared to
state-of-the-art MADRL algorithms with local rewards. We further provide a
structural analysis which shows that the utilization of global rewards can
improve implicit vehicle balancing and demand forecasting abilities. Our code
is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
( 2
min )
We propose a framework that leverages foundation models as teachers, guiding
a reinforcement learning agent to acquire semantically meaningful behavior
without human feedback. In our framework, the agent receives task instructions
grounded in a training environment from large language models. Then, a
vision-language model guides the agent in learning the multi-task
language-conditioned policy by providing reward feedback. We demonstrate that
our method can learn semantically meaningful skills in a challenging open-ended
MineDojo environment while prior unsupervised skill discovery methods struggle.
Additionally, we discuss observed challenges of using off-the-shelf foundation
models as teachers and our efforts to address them.
( 2
min )
We present several methods for predicting the dynamics of Hamiltonian systems
from discrete observations of their vector field. Each method is either
informed or uninformed of the Hamiltonian property. We empirically and
comparatively evaluate the methods and observe that information that the system
is Hamiltonian can be effectively informed, and that different methods strike
different trade-offs between efficiency and effectiveness for different
dynamical systems.
( 2
min )
In real-world scenarios classification models are often required to perform
robustly when predicting samples belonging to classes that have not appeared
during its training stage. Open Set Recognition addresses this issue by
devising models capable of detecting unknown classes from samples arriving
during the testing phase, while maintaining a good level of performance in the
classification of samples belonging to known classes. This review
comprehensively overviews the recent literature related to Open Set
Recognition, identifying common practices, limitations, and connections of this
field with other machine learning research areas, such as continual learning,
out-of-distribution detection, novelty detection, and uncertainty estimation.
Our work also uncovers open problems and suggests several research directions
that may motivate and articulate future efforts towards more safe Artificial
Intelligence methods.
( 2
min )
Humanoid robots will be able to assist humans in their daily life, in
particular due to their versatile action capabilities. However, while these
robots need a certain degree of autonomy to learn and explore, they also should
respect various constraints, for access control and beyond. We explore the
novel field of incorporating privacy, security, and access control constraints
with robot task planning approaches. We report preliminary results on the
classical symbolic approach, deep-learned neural networks, and modern ideas
using large language models as knowledge base. From analyzing their trade-offs,
we conclude that a hybrid approach is necessary, and thereby present a new use
case for the emerging field of neuro-symbolic artificial intelligence.
( 2
min )
In continual learning, networks confront a trade-off between stability and
plasticity when trained on a sequence of tasks. To bolster plasticity without
sacrificing stability, we propose a novel training algorithm called LRFR. This
approach optimizes network parameters in the null space of the past tasks'
feature representation matrix to guarantee the stability. Concurrently, we
judiciously select only a subset of neurons in each layer of the network while
training individual tasks to learn the past tasks' feature representation
matrix in low-rank. This increases the null space dimension when designing
network parameters for subsequent tasks, thereby enhancing the plasticity.
Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning,
the proposed approach consistently outperforms state-of-the-art methods.
( 2
min )
We propose HAROOD as a short-range FMCW radar-based human activity classifier
and out-of-distribution (OOD) detector. It aims to classify human sitting,
standing, and walking activities and to detect any other moving or stationary
object as OOD. We introduce a two-stage network. The first stage is trained
with a novel loss function that includes intermediate reconstruction loss,
intermediate contrastive loss, and triplet loss. The second stage uses the
first stage's output as its input and is trained with cross-entropy loss. It
creates a simple classifier that performs the activity classification. On our
dataset collected by 60 GHz short-range FMCW radar, we achieve an average
classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04%
as an OOD detector. Additionally, our extensive evaluations demonstrate the
superiority of HAROOD over the state-of-the-art OOD detection methods in terms
of standard OOD detection metrics.
( 2
min )
We address the Continual Learning (CL) problem, where a model has to learn a
sequence of tasks from non-stationary distributions while preserving prior
knowledge as it encounters new experiences. With the advancement of foundation
models, CL research has shifted focus from the initial learning-from-scratch
paradigm to the use of generic features from large-scale pre-training. However,
existing approaches to CL with pre-trained models only focus on separating the
class-specific features from the final representation layer and neglect the
power of intermediate representations that capture low- and mid-level features
naturally more invariant to domain shifts. In this work, we propose LayUP, a
new class-prototype-based approach to continual learning that leverages
second-order feature statistics from multiple intermediate layers of a
pre-trained network. Our method is conceptually simple, does not require any
replay buffer, and works out of the box with any foundation model. LayUP
improves over the state-of-the-art on four of the seven class-incremental
learning settings at a considerably reduced memory and computational footprint
compared with the next best baseline. Our results demonstrate that fully
exhausting the representational capacities of pre-trained models in CL goes far
beyond their final embeddings.
( 2
min )
Deep Reinforcement Learning (DRL) has achieved remarkable advances in
sequential decision tasks. However, recent works have revealed that DRL agents
are susceptible to slight perturbations in observations. This vulnerability
raises concerns regarding the effectiveness and robustness of deploying such
agents in real-world applications. In this work, we propose a novel robust
reinforcement learning method called SortRL, which improves the robustness of
DRL policies against observation perturbations from the perspective of the
network architecture. We employ a novel architecture for the policy network
that incorporates global $l_\infty$ Lipschitz continuity and provide a
convenient method to enhance policy robustness based on the output margin.
Besides, a training framework is designed for SortRL, which solves given tasks
while maintaining robustness against $l_\infty$ bounded perturbations on the
observations. Several experiments are conducted to evaluate the effectiveness
of our method, including classic control tasks and video games. The results
demonstrate that SortRL achieves state-of-the-art robustness performance
against different perturbation strength.
( 2
min )
Many neural network architectures have been shown to be Turing Complete, and
can thus implement arbitrary algorithms. However, Transformers are unique in
that they can implement gradient-based learning algorithms \emph{under simple
parameter configurations}. A line of recent work shows that linear Transformers
naturally learn to implement gradient descent (GD) when trained on a linear
regression in-context learning task. But the linearity assumption (either in
the Transformer architecture or in the learning task) is far from realistic
settings where non-linear activations crucially enable Transformers to learn
complicated non-linear functions. In this paper, we provide theoretical and
empirical evidence that non-linear Transformers can, and \emph{in fact do},
learn to implement learning algorithms to learn non-linear functions in
context. Our results apply to a broad class of combinations of non-linear
architectures, and non-linear in-context learning tasks. Interestingly, we show
that the optimal choice of non-linear activation depends in a natural way on
the non-linearity of the learning task.
( 2
min )
Melanoma is a type of cancer that begins in the cells controlling the pigment
of the skin, and it is often referred to as the most dangerous skin cancer.
Diagnosing melanoma can be time-consuming, and a recent increase in melanoma
incidents indicates a growing demand for a more efficient diagnostic process.
This paper presents a pipeline for melanoma diagnostics, leveraging two
convolutional neural networks, a diagnosis, and a prognosis model. The
diagnostic model is responsible for localizing malignant patches across whole
slide images and delivering a patient-level diagnosis as malignant or benign.
Further, the prognosis model utilizes the diagnostic model's output to provide
a patient-level prognosis as good or bad. The full pipeline has an F1 score of
0.79 when tested on data from the same distribution as it was trained on.
( 2
min )
Polyp segmentation, a contentious issue in medical imaging, has seen numerous
proposed methods aimed at improving the quality of segmented masks. Currently,
state-of-the-art techniques yield impressive results. However, the sheer size
of these models poses challenges for practical industry applications. To
address this, we present a Knowledge Distillation framework, incorporating
attention supervision and the symmetrical guiding method. This framework is
designed to facilitate knowledge transfer from a teacher model to a more
compact student model with fewer parameters. Our experimental evaluation of the
framework assesses its effectiveness in enabling the student model to acquire
knowledge from the teacher efficiently. Additionally, our method serves to
prevent the student model from incorporating redundant features that could lead
to inaccurate predictions. Consequently, our method, boasting approximately 5
million parameters, achieves competitive results comparable to the
state-of-the-art approaches. The implementation can be found at:
https://github.com/huyquoctrinh/KDAS3
( 2
min )
In this work, we formally prove that, under certain conditions, if a neural
network is invariant to a finite group then its weights recover the Fourier
transform on that group. This provides a mathematical explanation for the
emergence of Fourier features -- a ubiquitous phenomenon in both biological and
artificial learning systems. The results hold even for non-commutative groups,
in which case the Fourier transform encodes all the irreducible unitary group
representations. Our findings have consequences for the problem of symmetry
discovery. Specifically, we demonstrate that the algebraic structure of an
unknown group can be recovered from the weights of a network that is at least
approximately invariant within certain bounds. Overall, this work contributes
to a foundation for an algebraic learning theory of invariant neural network
representations.
( 2
min )
This article presents a new methodology for extracting intervals when a home
is vacant from low-frequency electricity consumption data. The approach
combines multiple algorithms, including change point detection, classification,
period detection, and periodic spikes retrieval. It shows encouraging results
on both simulated and real consumption curves. This approach offers practical
insights for optimizing energy use and holds potential benefits for residential
consumers and utility companies in terms of energy cost reduction and
sustainability. Further research is needed to enhance its applicability in
diverse settings and with larger datasets.
( 2
min )
In various scientific and engineering applications, there is typically an
approximate model of the underlying complex system, even though it contains
both aleatoric and epistemic uncertainties. In this paper, we present a
principled method to incorporate these approximate models as physics priors in
modeling, to prevent overfitting and enhancing the generalization capabilities
of the trained models. Utilizing the structural risk minimization (SRM)
inductive principle pioneered by Vapnik, this approach structures the physics
priors into generalized regularizers. The experimental results demonstrate that
our method achieves up to two orders of magnitude of improvement in testing
accuracy.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We introduced a new framework to detect perceptual bugs using a Long
Short-Term Memory (LSTM) network, which detects bugs in video games as
anomalies. The detected buggy frames are then clustered to determine the
category of the occurred bug. The framework was evaluated on two First Person
Shooter (FPS) games. Results show the effectiveness of the framework.
( 2
min )
Cardiovascular diseases, particularly heart failure, are a leading cause of
death globally. The early detection of heart failure through routine
echocardiogram screenings is often impeded by the high cost and labor-intensive
nature of these procedures, a barrier that can mean the difference between life
and death. This paper presents ConFormer, a novel deep learning model designed
to automate the estimation of Ejection Fraction (EF) and Left Ventricular Wall
Thickness from echocardiograms. The implementation of ConFormer has the
potential to enhance preventative cardiology by enabling cost-effective,
accessible, and comprehensive heart health monitoring, thereby saving countless
lives. The source code is available at https://github.com/Aether111/ConFormer.
( 2
min )
Hypernetworks are meta neural networks that generate weights for a main
neural network in an end-to-end differentiable manner. Despite extensive
applications ranging from multi-task learning to Bayesian deep learning, the
problem of optimizing hypernetworks has not been studied to date. We observe
that classical weight initialization methods like Glorot & Bengio (2010) and He
et al. (2015), when applied directly on a hypernet, fail to produce weights for
the mainnet in the correct scale. We develop principled techniques for weight
initialization in hypernets, and show that they lead to more stable mainnet
weights, lower training loss, and faster convergence.
( 2
min )
In this paper, we propose a novel personalized decision support system that
combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning
(XRL) to provide effective and interpretable interventions. Our method
leverages DRL to provide expert action recommendations while incorporating ToM
modeling to understand users' mental states and predict their future actions,
enabling appropriate timing for intervention. To explain interventions, we use
counterfactual explanations based on RL's feature importance and users' ToM
model structure. Our proposed system generates accurate and personalized
interventions that are easily interpretable by end-users. We demonstrate the
effectiveness of our approach through a series of crowd-sourcing experiments in
a simulated team decision-making task, where our system outperforms control
baselines in terms of task performance. Our proposed approach is agnostic to
task environment and RL model structure, therefore has the potential to be
generalized to a wide range of applications.
( 2
min )
In many applications, such as scientific literature management, researcher
search, social network analysis and etc, Name Disambiguation (aiming at
disambiguating WhoIsWho) has been a challenging problem. In addition, the
growth of scientific literature makes the problem more difficult and urgent.
Although name disambiguation has been extensively studied in academia and
industry, the problem has not been solved well due to the clutter of data and
the complexity of the same name scenario. In this work, we aim to explore
models that can perform the task of name disambiguation using the network
structure that is intrinsic to the problem and present an analysis of the
models.
( 2
min )
The high dimensionality and complexity of neuroimaging data necessitate large
datasets to develop robust and high-performing deep learning models. However,
the neuroimaging field is notably hampered by the scarcity of such datasets. In
this work, we proposed a data augmentation and validation framework that
utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to
enrich datasets. We extended multivariate time series data by predicting the
time courses of independent component networks (ICNs) in both one-step and
recursive configurations. The effectiveness of these augmented datasets was
then compared with the original data using various deep learning models
designed for chronological age prediction tasks. The results suggest that our
approach improves model performance, providing a robust solution to overcome
the challenges presented by the limited size of neuroimaging datasets.
( 2
min )
Motivated by policy gradient methods in the context of reinforcement
learning, we derive the first large deviation rate function for the iterates
generated by stochastic gradient descent for possibly non-convex objectives
satisfying a Polyak-Lojasiewicz condition. Leveraging the contraction principle
from large deviations theory, we illustrate the potential of this result by
showing how convergence properties of policy gradient with a softmax
parametrization and an entropy regularized objective can be naturally extended
to a wide spectrum of other policy parametrizations.
( 2
min )
We study Off-Policy Evaluation (OPE) in contextual bandit settings with large
action spaces. The benchmark estimators suffer from severe bias and variance
tradeoffs. Parametric approaches suffer from bias due to difficulty specifying
the correct model, whereas ones with importance weight suffer from variance. To
overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was
proposed to mitigate the estimator's variance via embeddings of an action.
Nevertheless, MIPS is unbiased under the no direct effect, which assumes that
the action embedding completely mediates the effect of an action on a reward.
To overcome the dependency on these unrealistic assumptions, we propose a
Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the
proposed estimator is unbiased under weaker assumptions than MIPS while
reducing the variance against MIPS. The empirical experiment verifies the
supremacy of MDR against existing estimators with large action spaces.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Gaussian process regression is a classical kernel method for function
estimation and data interpolation. In large data applications, computational
costs can be reduced using low-rank or sparse approximations of the kernel.
This paper investigates the effect of such kernel approximations on the
interpolation error. We introduce a unified framework to analyze Gaussian
process regression under important classes of computational misspecification:
Karhunen-Lo\`eve expansions that result in low-rank kernel approximations,
multiscale wavelet expansions that induce sparsity in the covariance matrix,
and finite element representations that induce sparsity in the precision
matrix. Our theory also accounts for epistemic misspecification in the choice
of kernel parameters.
( 2
min )
This paper considers the problem of evaluating an autonomous system's
competency in performing a task, particularly when working in dynamic and
uncertain environments. The inherent opacity of machine learning models, from
the perspective of the user, often described as a `black box', poses a
challenge. To overcome this, we propose using a measure called the Surprise
index, which leverages available measurement data to quantify whether the
dynamic system performs as expected. We show that the surprise index can be
computed in closed form for dynamic systems when observed evidence in a
probabilistic model if the joint distribution for that evidence follows a
multivariate Gaussian marginal distribution. We then apply it to a nonlinear
spacecraft maneuver problem, where actions are chosen by a reinforcement
learning agent and show it can indicate how well the trajectory follows the
required orbit.
( 2
min )
Predictive Process Monitoring (PPM) aims at leveraging historic process
execution data to predict how ongoing executions will continue up to their
completion. In recent years, PPM techniques for the prediction of the next
activities have matured significantly, mainly thanks to the use of Neural
Networks (NNs) as a predictor. While their performance is difficult to beat in
the general case, there are specific situations where background process
knowledge can be helpful. Such knowledge can be leveraged for improving the
quality of predictions for exceptional process executions or when the process
changes due to a concept drift. In this paper, we present a Symbolic[Neuro]
system that leverages background knowledge expressed in terms of a procedural
process model to offset the under-sampling in the training data. More
specifically, we make predictions using NNs with attention mechanism, an
emerging technology in the NN field. The system has been tested on several
real-life logs showing an improvement in the performance of the prediction
task.
( 2
min )
A large amount of effort has recently been put into understanding the barren
plateau phenomenon. In this perspective article, we face the increasingly loud
elephant in the room and ask a question that has been hinted at by many but not
explicitly addressed: Can the structure that allows one to avoid barren
plateaus also be leveraged to efficiently simulate the loss classically? We
present strong evidence that commonly used models with provable absence of
barren plateaus are also classically simulable, provided that one can collect
some classical data from quantum devices during an initial data acquisition
phase. This follows from the observation that barren plateaus result from a
curse of dimensionality, and that current approaches for solving them end up
encoding the problem into some small, classically simulable, subspaces. This
sheds serious doubt on the non-classicality of the information processing
capabilities of parametrized quantum circuits for barren plateau-free
landscapes and on the possibility of superpolynomial advantages from running
them on quantum hardware. We end by discussing caveats in our arguments, the
role of smart initializations, and by highlighting new opportunities that our
perspective raises.
( 3
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical […]
( 9
min )
“Minimum viewing time” benchmark gauges image recognition complexity for AI systems by measuring the time needed for accurate human identification.
( 11
min )
Using generative AI, MIT chemists created a model that can predict the structures formed when a chemical reaction reaches its point of no return.
( 9
min )
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.
( 2
min )
In distributed training, communication often emerges as a bottleneck. In
response, we introduce Kimad, a solution that offers adaptive gradient
compression. By consistently monitoring bandwidth, Kimad refines compression
ratios to match specific neural network layer requirements. Our exhaustive
tests and proofs confirm Kimad's outstanding performance, establishing it as a
benchmark in adaptive compression for distributed deep learning.
( 2
min )
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures
in the realm of quantum machine learning, poised to leverage the nascent
capabilities of near-term quantum computers to surmount classical machine
learning challenges. Nonetheless, the training efficiency challenge poses a
limitation on both QNNs and quantum kernels, curbing their efficacy when
applied to extensive datasets. To confront this concern, we present a unified
approach: coreset selection, aimed at expediting the training of QNNs and
quantum kernels by distilling a judicious subset from the original training
dataset. Furthermore, we analyze the generalization error bounds of QNNs and
quantum kernels when trained on such coresets, unveiling the comparable
performance with those training on the complete original dataset. Through
systematic numerical simulations, we illuminate the potential of coreset
selection in expediting tasks encompassing synthetic data classification,
identification of quantum correlations, and quantum compiling. Our work offers
a useful way to improve diverse quantum machine learning models with a
theoretical guarantee while reducing the training cost.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
We consider decentralized learning for zero-sum games, where players only see
their payoff information and are agnostic to actions and payoffs of the
opponent. Previous works demonstrated convergence to a Nash equilibrium in this
setting using double time-scale algorithms under strong reachability
assumptions. We address the open problem of achieving an approximate Nash
equilibrium efficiently with an uncoupled and single time-scale algorithm under
weaker conditions. Our contribution is a rational and convergent algorithm,
utilizing Tsallis-entropy regularization in a value-iteration-based approach.
The algorithm learns an approximate Nash equilibrium in polynomial time,
requiring only the existence of a policy pair that induces an irreducible and
aperiodic Markov chain, thus considerably weakening past assumptions. Our
analysis leverages negative drift inequalities and introduces novel properties
of Tsallis entropy that are of independent interest.
( 2
min )
This paper extends our previous method for COVID-19 diagnosis, proposing an
enhanced solution for detecting COVID-19 from computed tomography (CT) images.
To decrease model misclassifications, two key steps of image processing were
employed. Firstly, the uppermost and lowermost slices were removed, preserving
sixty percent of each patient's slices. Secondly, all slices underwent manual
cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by
224) were input into an Xception transfer learning model. Leveraging Xception's
architecture and pre-trained weights, the modified model achieved binary
classification. Promising results on the COV19-CT database showcased higher
validation accuracy and macro F1 score at both the slice and patient levels
compared to our previous solution and alternatives on the same dataset.
( 2
min )
Cadastres from the 19th century are a complex as well as rich source for
historians and archaeologists, whose use presents them with great challenges.
For archaeological and historical remote sensing, we have trained several Deep
Learning models, CNNs as well as Vision Transformers, to extract large-scale
data from this knowledge representation. We present the principle results of
our work here and we present a the demonstrator of our browser-based tool that
allows researchers and public stakeholders to quickly identify spots that
featured buildings in the 19th century Franciscean Cadastre. The tool not only
supports scholars and fellow researchers in building a better understanding of
the settlement history of the region of Styria, it also helps public
administration and fellow citizens to swiftly identify areas of heightened
sensibility with regard to the cultural heritage of the region.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a novel method that provides non-linear correction for
classifier-free guided DDPMs. Such correction forces the guided DDPMs to
respect the Fokker-Planck equation of their underlying diffusion process, in a
way that is first-principle, training-free, derivative-free, and compatible
with existing sampling methods. Experiments show that characteristic guidance
is robust to various applications, offers enhanced control over sample
generation, suppresses color and exposure issues even for latent space
sampling, and can handle physics problems such as the phase transitions.
( 2
min )
Likelihood-free inference is quickly emerging as a powerful tool to perform
fast/effective parameter estimation. We demonstrate a technique of optimizing
likelihood-free inference to make it even faster by marginalizing symmetries in
a physical problem. In this approach, physical symmetries, for example,
time-translation are learned using joint-embedding via self-supervised learning
with symmetry data augmentations. Subsequently, parameter inference is
performed using a normalizing flow where the embedding network is used to
summarize the data before conditioning the parameters. We present this approach
on two simple physical problems and we show faster convergence in a smaller
number of parameters compared to a normalizing flow that does not use a
pre-trained symmetry-informed representation.
( 2
min )
The utilization of deep learning-based object detection is an effective
approach to assist visually impaired individuals in avoiding obstacles. In this
paper, we implemented seven different YOLO object detection models
\textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and
YOLOv5 and performed comprehensive evaluation with carefully tuned
hyperparameters, to analyze how these models performed on images containing
common daily-life objects presented on roads and sidewalks. After a systematic
investigation, YOLOv8 was found to be the best model, which reached a precision
of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which
includes images from VOC dataset, COCO dataset, and TT100K dataset along with
images collected by the researchers in the field. Despite being the latest
model and demonstrating better performance in many other applications, YOLO-NAS
was found to be suboptimal for the obstacle detection task.
( 2
min )
Sleep detection and annotation are crucial for researchers to understand
sleep patterns, especially in children. With modern wrist-worn watches
comprising built-in accelerometers, sleep logs can be collected. However, the
annotation of these logs into distinct sleep events: onset and wakeup, proves
to be challenging. These annotations must be automated, precise, and scalable.
We propose to model the accelerometer data using different machine learning
(ML) techniques such as support vectors, boosting, ensemble methods, and more
complex approaches involving LSTMs and Region-based CNNs. Later, we aim to
evaluate these approaches using the Event Detection Average Precision (EDAP)
score (similar to the IOU metric) to eventually compare the predictive power
and model performance.
( 2
min )
Safeguarding privacy in sensitive training data is paramount, particularly in
the context of generative modeling. This is done through either differentially
private stochastic gradient descent, or with a differentially private metric
for training models or generators. In this paper, we introduce a novel
differentially private generative modeling approach based on parameter-free
gradient flows in the space of probability measures. The proposed algorithm is
a new discretized flow which operates through a particle scheme, utilizing
drift derived from the sliced Wasserstein distance and computed in a private
manner. Our experiments show that compared to a generator-based model, our
proposed model can generate higher-fidelity data at a low privacy budget,
offering a viable alternative to generator-based approaches.
( 2
min )
Influenced mixed moving average fields are a versatile modeling class for
spatio-temporal data. However, their predictive distribution is not generally
known. Under this modeling assumption, we define a novel spatio-temporal
embedding and a theory-guided machine learning approach that employs a
generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz
predictors and determine fixed-time and any-time PAC Bayesian bounds in the
batch learning setting. Performing causal forecast is a highlight of our
methodology as its potential application to data with spatial and temporal
short and long-range dependence. We then test the performance of our learning
methodology by using linear predictors and data sets simulated from a
spatio-temporal Ornstein-Uhlenbeck process.
( 2
min )
The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a
factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix.
RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional
arithmetic operations, and it can be implemented with just a few lines of code.
The method is particularly useful for approximating a kernel matrix.
This paper offers a thorough new investigation of the empirical and
theoretical behavior of this fundamental algorithm. For matrix approximation
problems that arise in scientific machine learning, experiments show that
RPCholesky matches or beats the performance of alternative algorithms.
Moreover, RPCholesky provably returns low-rank approximations that are nearly
optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly
support its use in scientific computing and machine learning applications.
( 2
min )
With the rise of voice search, how can businesses adapt their SEO strategies to optimize for conversational queries, backed by data-driven insights? Voice search is causing changes to occur in search engine optimization. Users are using more natural language and conversational queries with voice-activated devices. Businesses need to adjust SEO strategies for changing search behavior.… Read More »Voice Search Revolution: Data-Driven SEO Strategies for Future Success
The post Voice Search Revolution: Data-Driven SEO Strategies for Future Success appeared first on Data Science Central.
( 26
min )
The Energy and Climate Hack presented opportunities for students and companies to collaborate and develop innovative solutions.
( 8
min )
Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, […]
( 16
min )
This is a customer post jointly authored by ICL and AWS employees. ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored […]
( 8
min )
Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you […]
( 8
min )
Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging. Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November […]
( 9
min )
This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.
( 10
min )
This GFN Thursday is burning rubber with the latest Forza Horizon games from Microsoft Studios. Check them out on PC Game Pass. Plus, give the gift of cloud gaming with the latest membership bundle, which includes a free, three-month PC Game Pass subscription with the purchase of a six-month GeForce NOW Ultimate membership. It’s all Read article >
( 6
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )